q2-vsearch

Formats¶

vsearch cluster-features-de-novo¶

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Apply one of the vsearch uchime*_denovo methods to identify chimeric feature sequences. The results of these methods can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch manual.

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]
new_reference_sequences: FeatureData[Sequence]: The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2025.7.0.dev0
website: https://github.com/qiime2/q2-vsearch
user support:: Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:: Rognes et al., 2016

Actions¶

Name	Type	Short Description
cluster-features-de-novo	method	De novo clustering of features.
cluster-features-closed-reference	method	Closed-reference clustering of features.
dereplicate-sequences	method	Dereplicate sequences.
merge-pairs	method	Merge paired-end reads.
uchime-ref	method	Reference-based chimera filtering.
uchime-denovo	method	De novo chimera filtering.
fastq-stats	visualizer	Fastq stats with vsearch.
cluster-features-open-reference	pipeline	Open-reference clustering of features.

Artifact Classes¶

Formats¶

vsearch cluster-features-de-novo¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: Sequences representing clustered features.[required]

Examples¶

cluster_features_de_novo¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)

url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)

clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1,
)

library(reticulate)

Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)

action_results <- vsearch_actions$cluster_features_de_novo(
    sequences=seqs1,
    table=table1,
    perc_identity=0.97,
    strand='plus',
    threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences

seqs1.qza | download | view
table1.qza | download | view
clustered-table.qza | download | view
clustered-sequences.qza | download | view

vsearch cluster-features-closed-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The sequences corresponding to the features in table.[required]
table: FeatureTable[Frequency]: The feature table to be clustered.[required]
reference_sequences: FeatureData[Sequence]: The sequences to use as cluster centroids.[required]

Parameters¶

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True): The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
strand: Str % Choices('plus', 'both'): Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

clustered_table: FeatureTable[Frequency]: The table following clustering of features.[required]
clustered_sequences: FeatureData[Sequence]: The sequences representing clustered features, relabeled by the reference IDs.[required]
unmatched_sequences: FeatureData[Sequence]: The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]

vsearch dereplicate-sequences¶

Citations¶

Inputs¶

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]: The sequences to be dereplicated.[required]

Parameters¶

derep_prefix: Bool: Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]
min_seq_length: Int % Range(1, None): Discard sequences shorter than this integer.[default: 1]
min_unique_size: Int % Range(1, None): Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs¶

dereplicated_table: FeatureTable[Frequency]: The table of dereplicated sequences.[required]
dereplicated_sequences: FeatureData[Sequence]: The dereplicated sequences.[required]

vsearch merge-pairs¶

Citations¶

Inputs¶

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]: The demultiplexed paired-end sequences to be merged.[required]

Parameters¶

truncqual: Int % Range(0, None): Truncate sequences at the first base with the specified quality score value or lower.[optional]
minlen: Int % Range(0, None): Sequences shorter than minlen after truncation are discarded.[default: 1]
maxns: Int % Range(0, None): Sequences with more than maxns N characters are discarded.[optional]
allowmergestagger: Bool: Allow merging of staggered read pairs.[default: False]
minovlen: Int % Range(5, None): Minimum length of the area of overlap between reads during merging.[default: 10]
maxdiffs: Int % Range(0, None): Maximum number of mismatches in the area of overlap during merging.[default: 10]
minmergelen: Int % Range(0, None): Minimum length of the merged read to be retained.[optional]
maxmergelen: Int % Range(0, None): Maximum length of the merged read to be retained.[optional]
maxee: Float % Range(0.0, None): Maximum number of expected errors in the merged read to be retained.[optional]
threads: Threads: The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]
qmin: Int % Range(0, None): Minimum quality score accepted when reading FASTQ files.[default: 0]
qminout: Int % Range(0, None): Minimum quality score used when writing FASTQ files. Only applies to overlap region.[default: 0]
qmax: Int % Range(0, None): Maximum quality score accepted when reading FASTQ files.[default: 41]
qmaxout: Int % Range(0, None): Maximum quality score used when writing FASTQ files. Only applies to overlap region.[default: 41]

Outputs¶

merged_sequences: SampleData[JoinedSequencesWithQuality]: The merged sequences.[required]
unmerged_sequences: SampleData[PairedEndSequencesWithQuality]: The unmerged paired-end reads.[required]

vsearch uchime-ref¶

Citations¶

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]
reference_sequences: FeatureData[Sequence]: The non-chimeric reference sequences.[required]

Parameters¶

dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]
threads: Threads: The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch uchime-denovo¶

Citations¶

Rognes et al., 2016; Edgar et al., 2011; Edgar, 2016; Edgar, 2016

Inputs¶

sequences: FeatureData[Sequence]: The feature sequences to be chimera-checked.[required]
table: FeatureTable[Frequency]: Feature table (used for computing total feature abundances).[required]

Parameters¶

method: Str % Choices('uchime', 'uchime2', 'uchime3'): Denovo chimera detection based on uchime (Edgar 2011), uchime2 (Edgar 2016), or uchime3 (Edgar 2016).[default: 'uchime']
dn: Float % Range(0.0, None): No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]
mindiffs: Int % Range(1, None): Minimum number of differences per segment. Ignored for uchime2 and uchime3.[default: 3]
mindiv: Float % Range(0.0, None): Minimum divergence from closest parent. Ignored for uchime2 and uchime3.[default: 0.8]
minh: Float % Range(0.0, 1.0, inclusive_end=True): Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity. Ignored for uchime2 and uchime3.[default: 0.28]
xn: Float % Range(1.0, None, inclusive_start=False): No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs¶

chimeras: FeatureData[Sequence]: The chimeric sequences.[required]
nonchimeras: FeatureData[Sequence]: The non-chimeric sequences.[required]
stats: UchimeStats: Summary statistics from chimera checking.[required]

vsearch fastq-stats¶

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations¶

Inputs¶

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]: Fastq sequences[required]

Parameters¶

threads: Threads: The number of threads used for computation.[default: 1]

Outputs¶

visualization: Visualization: <no description>[required]

vsearch cluster-features-open-reference¶

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations¶