This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
This QIIME 2 plugin wraps DADA2 and supports sequence quality control for single-end and paired-end reads using the DADA2 R library.
- version:
2026.1.0.dev0 - website: http://
benjjneb .github .io /dada2/ - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Callahan et al., 2016
Actions¶
| Name | Type | Short Description |
|---|---|---|
| denoise-single | method | Denoise and dereplicate single-end sequences |
| denoise-paired | method | Denoise and dereplicate paired-end sequences |
| denoise-pyro | method | Denoise and dereplicate single-end pyrosequences |
| denoise-ccs | method | Denoise and dereplicate single-end Pacbio CCS |
| plot-base-transitions | visualizer | DADA2 diagnostic statistics |
Artifact Classes¶
SampleData[DADA2Stats] |
DADA2BaseTransitionStats |
Formats¶
DADA2StatsFormat |
DADA2StatsDirFmt |
DADA2BaseTransitionStatsFormat |
DADA2BaseTransitionStatsDirFmt |
dada2 denoise-single¶
This method denoises single-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality] The single-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True.If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_single¶
wget -O 'demux-single.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux-single.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn = 'demux-single.qza'
request.urlretrieve(url, fn)
demux_single = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0,
trunc_len=120,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-single.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -single /1 /demux -single .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-singletool: - Set "demultiplexed_seqs" to
#: demux-single.qza - Set "trunc_len" to
120 - Expand the
additional optionssection- Leave "trim_left" as its default value of
0
- Leave "trim_left" as its default value of
- Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-single/1/demux-single.qza'
fn <- 'demux-single.qza'
request$urlretrieve(url, fn)
demux_single <- Artifact$load(fn)
action_results <- dada2_actions$denoise_single(
demultiplexed_seqs=demux_single,
trim_left=0L,
trunc_len=120L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_single
denoise_single(use)
dada2 denoise-paired¶
This method denoises paired-end sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality] The paired-end demultiplexed sequences to be denoised.[required]
Parameters¶
- trunc_len_f:
Int Position at which forward read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trunc_len_r:
Int Position at which reverse read sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. After this parameter is applied there must still be at least a 12 nucleotide overlap between the forward and reverse reads. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left_f:
Int Position at which forward read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- trim_left_r:
Int Position at which reverse read sequences should be trimmed due to low quality. This trims the 5' end of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee_f:
Float Forward reads with number of expected errors higher than this value will be discarded.[default:
2.0]- max_ee_r:
Float Reverse reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len_fortrunc_len_r(depending on the direction of the read) it is discarded.[default:2]- min_overlap:
Int%Range(4, None) The minimum length of the overlap required for merging the forward and reverse reads.[default:
12]- max_merge_mismatch:
Int The maximum number of mismatches allowed in the overlap region when merging reads. If 0, only exact overlaps are allowed.[default:
0]- trim_overhang:
Bool If TRUE, "overhangs" in the alignment after merging are trimmed off. "Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region.[default:
False]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is TrueIf True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence, and these sequences will be the joined paired-end sequences.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
Examples¶
denoise_paired¶
wget -O 'demux-paired.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 140 \
--o-representative-sequences representative-sequences.qza \
--o-table table.qza \
--o-denoising-stats denoising-stats.qza \
--o-base-transition-stats base-transition-stats.qzafrom qiime2 import Artifact
from urllib import request
import qiime2.plugins.dada2.actions as dada2_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn = 'demux-paired.qza'
request.urlretrieve(url, fn)
demux_paired = Artifact.load(fn)
table, representative_sequences, denoising_stats, base_transition_stats = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150,
trunc_len_r=140,
)- Using the
Upload Datatool: - On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.- Set "Name" (first text-field) to:
demux-paired.qza - In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /dada2 /denoise -paired /1 /demux -paired .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Startbutton at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 dada2 denoise-pairedtool: - Set "demultiplexed_seqs" to
#: demux-paired.qza - Set "trunc_len_f" to
150 - Set "trunc_len_r" to
140 - Press the
Executebutton.
- Set "demultiplexed_seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
dada2_actions <- import("qiime2.plugins.dada2.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/dada2/denoise-paired/1/demux-paired.qza'
fn <- 'demux-paired.qza'
request$urlretrieve(url, fn)
demux_paired <- Artifact$load(fn)
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_paired,
trunc_len_f=150L,
trunc_len_r=140L,
)
representative_sequences <- action_results$representative_sequences
table <- action_results$table
denoising_stats <- action_results$denoising_stats
base_transition_stats <- action_results$base_transition_stats
from q2_dada2._examples import denoise_paired
denoise_paired(use)
dada2 denoise-pyro¶
This method denoises single-end pyrosequencing sequences, dereplicates them, and filters chimeras.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed pyrosequencing sequences (e.g. 454, IonTorrent) to be denoised.[required]
Parameters¶
- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed[required]
- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised independently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). This parameter has no effect if chimera_method is "none".[default:
1.0]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
250000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 denoise-ccs¶
This method denoises single-end Pacbio CCS sequences, dereplicates them, and filters chimeras. Tutorial and workflow: https://
Citations¶
Callahan et al., 2016; Callahan et al., 2019
Inputs¶
- demultiplexed_seqs:
SampleData[SequencesWithQuality] The single-end demultiplexed PacBio CCS sequences to be denoised.[required]
Parameters¶
- front:
Str Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded. Each read is re-oriented if the reverse complement of the read is a better match to the provided primer sequence. This is recommended for PacBio CCS reads, which come in a random mix of forward and reverse-complement orientations.[required]
- adapter:
Str Sequence of an adapter ligated to the 3' end. The adapter and any preceding bases are trimmed. Can contain IUPAC ambiguous nucleotide codes. Note, primer direction is 5' to 3'. Primers are removed before trim and filter step. Reads that do not contain the primer are discarded.[optional]
- max_mismatch:
Int The number of mismatches to tolerate when matching reads to primer sequences - see http://
benjjneb .github .io /dada2/ for complete details.[default: 2]- indels:
Bool Allow insertions or deletions of bases when matching adapters. Note that primer matching can be significantly slower, currently about 4x slower[default:
False]- trunc_len:
Int Position at which sequences should be truncated due to decrease in quality. This truncates the 3' end of the of the input sequences, which will be the bases that were sequenced in the last cycles. Reads that are shorter than this value will be discarded. If 0 is provided, no truncation or length filtering will be performed. Note: Since Pacbio CCS sequences were normally with very high quality scores, there is no need to truncate the Pacbio CCS sequences.[default:
0]- trim_left:
Int Position at which sequences should be trimmed due to low quality. This trims the 5' end of the of the input sequences, which will be the bases that were sequenced in the first cycles.[default:
0]- max_ee:
Float Reads with number of expected errors higher than this value will be discarded.[default:
2.0]- trunc_q:
Int Reads are truncated at the first instance of a quality score less than or equal to this value. If the resulting read is then shorter than
trunc_len, it is discarded.[default:2]- min_len:
Int Remove reads with length less than minLen. minLen is enforced after trimming and truncation. For 16S Pacbio CCS, suggest 1000.[default:
20]- max_len:
Int Remove reads prior to trimming or truncation which are longer than this value. If 0 is provided no reads will be removed based on length. For 16S Pacbio CCS, suggest 1600.[default:
0]- pooling_method:
Str%Choices('independent', 'pseudo') The method used to pool samples for denoising. "independent": Samples are denoised indpendently. "pseudo": The pseudo-pooling method is used to approximate pooling of samples. In short, samples are denoised independently once, ASVs detected in at least 2 samples are recorded, and samples are denoised independently a second time, but this time with prior knowledge of the recorded ASVs and thus higher sensitivity to those ASVs.[default:
'independent']- chimera_method:
Str%Choices('consensus', 'none') The method used to remove chimeras. "none": No chimera removal is performed. "consensus": Chimeras are detected in samples individually, and sequences found chimeric in a sufficient fraction of samples are removed.[default:
'consensus']- min_fold_parent_over_abundance:
Float The minimum abundance of potential parents of a sequence being tested as chimeric, expressed as a fold-change versus the abundance of the sequence being tested. Values should be greater than or equal to 1 (i.e. parents should be more abundant than the sequence being tested). Suggest 3.5. This parameter has no effect if chimera_method is "none".[default:
3.5]- allow_one_off:
Bool Bimeras that are one-off from exact are also identified if the
allow_one_offargument is True. If True, a sequence will be identified as bimera if it is one mismatch or indel away from an exact bimera.[default:False]- n_threads:
Threads The number of threads to use for multithreaded processing. If 0 is provided, all available cores will be used.[default:
1]- n_reads_learn:
Int The number of reads to use when training the error model. Smaller numbers will result in a shorter run time but a less reliable error model.[default:
1000000]- hashed_feature_ids:
Bool If true, the feature ids in the resulting table will be presented as hashes of the sequences defining each feature. The hash will always be the same for the same sequence so this allows feature tables to be merged across runs of this method. You should only merge tables if the exact same parameters are used for each run.[default:
True]- retain_all_samples:
Bool If True all samples input to dada2 will be retained in the output of dada2, if false samples with zero total frequency are removed from the table.[default:
True]
Outputs¶
- table:
FeatureTable[Frequency] The resulting feature table.[required]
- representative_sequences:
FeatureData[Sequence] The resulting feature sequences. Each feature in the feature table will be represented by exactly one sequence.[required]
- denoising_stats:
SampleData[DADA2Stats] A table listing per-sample read retention counts and percentages after each stage of the pipeline.[required]
- base_transition_stats:
DADA2BaseTransitionStats A table listing the transition rates of each ordered pair of nucleotides at each quality score.[required]
dada2 plot-base-transitions¶
Generates dada2 output stat vizualizations
Citations¶
Inputs¶
- base_transition_stats:
DADA2BaseTransitionStats Dada2 Base transition statistics.[required]
Parameters¶
- nominalq:
Bool Sets the nominalq line of the vizualization[default:
False]- error_in:
Bool Sets the input error line of the vizualization[default:
False]- error_out:
Bool Sets the output error line of the vizualization[default:
True]
Outputs¶
- visualization:
Visualization <no description>[required]
- Links
- Documentation
- Source Code
- Stars
- 22
- Last Commit
- 0bb9323
- Available Distros
- 2025.10
- 2025.10/amplicon
- 2025.7
- 2025.7/amplicon
- 2025.4
- 2025.4/amplicon
- 2024.10
- 2024.10/amplicon
- 2024.5
- 2024.5/amplicon
- 2024.2
- 2024.2/amplicon
- 2023.9
- 2023.9/amplicon
- 2023.7
- 2023.7/core