This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]

This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.

version: 2024.10.0
website: https://github.com/qiime2/q2-vsearch
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Rognes et al., 2016

Actions

NameTypeShort Description
cluster-features-de-novomethodDe novo clustering of features.
cluster-features-closed-referencemethodClosed-reference clustering of features.
dereplicate-sequencesmethodDereplicate sequences.
merge-pairsmethodMerge paired-end reads.
uchime-refmethodReference-based chimera filtering with vsearch.
uchime-denovomethodDe novo chimera filtering with vsearch.
fastq-statsvisualizerFastq stats with vsearch.
cluster-features-open-referencepipelineOpen-reference clustering of features.

Artifact Classes

UchimeStats

Formats

UchimeStatsFmt
UchimeStatsDirFmt


vsearch cluster-features-de-novo

Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

Examples

cluster_features_de_novo

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'seqs1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'

wget -O 'table1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'

qiime vsearch cluster-features-de-novo \
  --i-sequences seqs1.qza \
  --i-table table1.qza \
  --p-perc-identity 0.97 \
  --p-strand plus \
  --p-threads 1 \
  --o-clustered-table clustered-table.qza \
  --o-clustered-sequences clustered-sequences.qza

vsearch cluster-features-closed-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

The sequences representing clustered features, relabeled by the reference IDs.[required]

unmatched_sequences: FeatureData[Sequence]

The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]


vsearch dereplicate-sequences

Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[Sequences] | SampleData[SequencesWithQuality] | SampleData[JoinedSequencesWithQuality]

The sequences to be dereplicated.[required]

Parameters

derep_prefix: Bool

Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default: False]

min_seq_length: Int % Range(1, None)

Discard sequences shorter than this integer.[default: 1]

min_unique_size: Int % Range(1, None)

Discard sequences with a post-dereplication abundance value smaller than integer.[default: 1]

Outputs

dereplicated_table: FeatureTable[Frequency]

The table of dereplicated sequences.[required]

dereplicated_sequences: FeatureData[Sequence]

The dereplicated sequences.[required]


vsearch merge-pairs

Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.

Citations

Rognes et al., 2016

Inputs

demultiplexed_seqs: SampleData[PairedEndSequencesWithQuality]

The demultiplexed paired-end sequences to be merged.[required]

Parameters

truncqual: Int % Range(0, None)

Truncate sequences at the first base with the specified quality score value or lower.[optional]

minlen: Int % Range(0, None)

Sequences shorter than minlen after truncation are discarded.[default: 1]

maxns: Int % Range(0, None)

Sequences with more than maxns N characters are discarded.[optional]

allowmergestagger: Bool

Allow merging of staggered read pairs.[default: False]

minovlen: Int % Range(5, None)

Minimum length of the area of overlap between reads during merging.[default: 10]

maxdiffs: Int % Range(0, None)

Maximum number of mismatches in the area of overlap during merging.[default: 10]

minmergelen: Int % Range(0, None)

Minimum length of the merged read to be retained.[optional]

maxmergelen: Int % Range(0, None)

Maximum length of the merged read to be retained.[optional]

maxee: Float % Range(0.0, None)

Maximum number of expected errors in the merged read to be retained.[optional]

threads: Threads

The number of threads to use for computation. Does not scale much past 4 threads.[default: 1]

Outputs

merged_sequences: SampleData[JoinedSequencesWithQuality]

The merged sequences.[required]

unmerged_sequences: SampleData[PairedEndSequencesWithQuality]

The unmerged paired-end reads.[required]


vsearch uchime-ref

Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

reference_sequences: FeatureData[Sequence]

The non-chimeric reference sequences.[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch uchime-denovo

Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.

Citations

Rognes et al., 2016

Inputs

sequences: FeatureData[Sequence]

The feature sequences to be chimera-checked.[required]

table: FeatureTable[Frequency]

Feature table (used for computing total feature abundances).[required]

Parameters

dn: Float % Range(0.0, None)

No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default: 1.4]

mindiffs: Int % Range(1, None)

Minimum number of differences per segment.[default: 3]

mindiv: Float % Range(0.0, None)

Minimum divergence from closest parent.[default: 0.8]

minh: Float % Range(0.0, 1.0, inclusive_end=True)

Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default: 0.28]

xn: Float % Range(1.0, None, inclusive_start=False)

No vote weight, corresponding to the parameter beta in the scoring function.[default: 8.0]

Outputs

chimeras: FeatureData[Sequence]

The chimeric sequences.[required]

nonchimeras: FeatureData[Sequence]

The non-chimeric sequences.[required]

stats: UchimeStats

Summary statistics from chimera checking.[required]


vsearch fastq-stats

A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://github.com/torognes/vsearch for detailed documentation of these tools.

Citations

Rognes et al., 2016

Inputs

sequences: SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]

Fastq sequences[required]

Parameters

threads: Threads

The number of threads used for computation.[default: 1]

Outputs

visualization: Visualization

<no description>[required]


vsearch cluster-features-open-reference

Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.

Citations

Rognes et al., 2016; Rideout et al., 2014

Inputs

sequences: FeatureData[Sequence]

The sequences corresponding to the features in table.[required]

table: FeatureTable[Frequency]

The feature table to be clustered.[required]

reference_sequences: FeatureData[Sequence]

The sequences to use as cluster centroids.[required]

Parameters

perc_identity: Float % Range(0, 1, inclusive_start=False, inclusive_end=True)

The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]

strand: Str % Choices('plus', 'both')

Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default: 'plus']

threads: Threads

The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default: 1]

Outputs

clustered_table: FeatureTable[Frequency]

The table following clustering of features.[required]

clustered_sequences: FeatureData[Sequence]

Sequences representing clustered features.[required]

new_reference_sequences: FeatureData[Sequence]

The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]