This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
This plugin wraps the vsearch application, and provides methods for clustering and dereplicating features and sequences.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -vsearch - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Rognes et al., 2016
Actions¶
Name | Type | Short Description |
---|---|---|
cluster-features-de-novo | method | De novo clustering of features. |
cluster-features-closed-reference | method | Closed-reference clustering of features. |
dereplicate-sequences | method | Dereplicate sequences. |
merge-pairs | method | Merge paired-end reads. |
uchime-ref | method | Reference-based chimera filtering with vsearch. |
uchime-denovo | method | De novo chimera filtering with vsearch. |
fastq-stats | visualizer | Fastq stats with vsearch. |
cluster-features-open-reference | pipeline | Open-reference clustering of features. |
Artifact Classes¶
UchimeStats |
Formats¶
UchimeStatsFmt |
UchimeStatsDirFmt |
vsearch cluster-features-de-novo¶
Given a feature table and the associated feature sequences, cluster the features based on user-specified percent identity threshold of their sequences. This is not a general-purpose de novo clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers and sequences will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
Examples¶
cluster_features_de_novo¶
wget -O 'seqs1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
wget -O 'table1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
qiime vsearch cluster-features-de-novo \
--i-sequences seqs1.qza \
--i-table table1.qza \
--p-perc-identity 0.97 \
--p-strand plus \
--p-threads 1 \
--o-clustered-table clustered-table.qza \
--o-clustered-sequences clustered-sequences.qza
from qiime2 import Artifact
from urllib import request
import qiime2.plugins.vsearch.actions as vsearch_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn = 'seqs1.qza'
request.urlretrieve(url, fn)
seqs1 = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn = 'table1.qza'
request.urlretrieve(url, fn)
table1 = Artifact.load(fn)
clustered_table, clustered_sequences = vsearch_actions.cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /seqs1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
table1.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /vsearch /cluster -features -de -novo /1 /table1 .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 vsearch cluster-features-de-novo
tool: - Set "sequences" to
#: seqs1.qza
- Set "table" to
#: table1.qza
- Set "perc_identity" to
0.97
- Expand the
additional options
section- Leave "strand" as its default value of
plus
- Leave "strand" as its default value of
- Press the
Execute
button.
- Set "sequences" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
request <- import("urllib")$request
vsearch_actions <- import("qiime2.plugins.vsearch.actions")
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/seqs1.qza'
fn <- 'seqs1.qza'
request$urlretrieve(url, fn)
seqs1 <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/vsearch/cluster-features-de-novo/1/table1.qza'
fn <- 'table1.qza'
request$urlretrieve(url, fn)
table1 <- Artifact$load(fn)
action_results <- vsearch_actions$cluster_features_de_novo(
sequences=seqs1,
table=table1,
perc_identity=0.97,
strand='plus',
threads=1L,
)
clustered_table <- action_results$clustered_table
clustered_sequences <- action_results$clustered_sequences
from q2_vsearch._examples import cluster_features_de_novo
cluster_features_de_novo(use)
vsearch cluster-features-closed-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. This is not a general-purpose closed-reference clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
The sequences representing clustered features, relabeled by the reference IDs.[required]
- unmatched_sequences:
FeatureData[Sequence]
The sequences which failed to match any reference sequences. This output maps to vsearch's --notmatched parameter.[required]
vsearch dereplicate-sequences¶
Dereplicate sequence data and create a feature table and feature representative sequences. Feature identifiers in the resulting artifacts will be the sha1 hash of the sequence defining each feature. If clustering of features into OTUs is desired, the resulting artifacts can be passed to the cluster_features_* methods in this plugin.
Citations¶
Inputs¶
- sequences:
SampleData[Sequences]
|
SampleData[SequencesWithQuality]
|
SampleData[JoinedSequencesWithQuality]
The sequences to be dereplicated.[required]
Parameters¶
- derep_prefix:
Bool
Merge sequences with identical prefixes. If a sequence is identical to the prefix of two or more longer sequences, it is clustered with the shortest of them. If they are equally long, it is clustered with the most abundant.[default:
False
]- min_seq_length:
Int
%
Range
(1, None)
Discard sequences shorter than this integer.[default:
1
]- min_unique_size:
Int
%
Range
(1, None)
Discard sequences with a post-dereplication abundance value smaller than integer.[default:
1
]
Outputs¶
- dereplicated_table:
FeatureTable[Frequency]
The table of dereplicated sequences.[required]
- dereplicated_sequences:
FeatureData[Sequence]
The dereplicated sequences.[required]
vsearch merge-pairs¶
Merge paired-end sequence reads using vsearch's merge_pairs function. See the vsearch documentation for details on how paired-end merging is performed, and for more information on the parameters to this method.
Citations¶
Inputs¶
- demultiplexed_seqs:
SampleData[PairedEndSequencesWithQuality]
The demultiplexed paired-end sequences to be merged.[required]
Parameters¶
- truncqual:
Int
%
Range
(0, None)
Truncate sequences at the first base with the specified quality score value or lower.[optional]
- minlen:
Int
%
Range
(0, None)
Sequences shorter than minlen after truncation are discarded.[default:
1
]- maxns:
Int
%
Range
(0, None)
Sequences with more than maxns N characters are discarded.[optional]
- allowmergestagger:
Bool
Allow merging of staggered read pairs.[default:
False
]- minovlen:
Int
%
Range
(5, None)
Minimum length of the area of overlap between reads during merging.[default:
10
]- maxdiffs:
Int
%
Range
(0, None)
Maximum number of mismatches in the area of overlap during merging.[default:
10
]- minmergelen:
Int
%
Range
(0, None)
Minimum length of the merged read to be retained.[optional]
- maxmergelen:
Int
%
Range
(0, None)
Maximum length of the merged read to be retained.[optional]
- maxee:
Float
%
Range
(0.0, None)
Maximum number of expected errors in the merged read to be retained.[optional]
- threads:
Threads
The number of threads to use for computation. Does not scale much past 4 threads.[default:
1
]
Outputs¶
- merged_sequences:
SampleData[JoinedSequencesWithQuality]
The merged sequences.[required]
- unmerged_sequences:
SampleData[PairedEndSequencesWithQuality]
The unmerged paired-end reads.[required]
vsearch uchime-ref¶
Apply the vsearch uchime_ref method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For additional details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
- reference_sequences:
FeatureData[Sequence]
The non-chimeric reference sequences.[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch uchime-denovo¶
Apply the vsearch uchime_denovo method to identify chimeric feature sequences. The results of this method can be used to filter chimeric features from the corresponding feature table. For more details, please refer to the vsearch documentation.
Citations¶
Inputs¶
- sequences:
FeatureData[Sequence]
The feature sequences to be chimera-checked.[required]
- table:
FeatureTable[Frequency]
Feature table (used for computing total feature abundances).[required]
Parameters¶
- dn:
Float
%
Range
(0.0, None)
No vote pseudo-count, corresponding to the parameter n in the chimera scoring function.[default:
1.4
]- mindiffs:
Int
%
Range
(1, None)
Minimum number of differences per segment.[default:
3
]- mindiv:
Float
%
Range
(0.0, None)
Minimum divergence from closest parent.[default:
0.8
]- minh:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Minimum score (h). Increasing this value tends to reduce the number of false positives and to decrease sensitivity.[default:
0.28
]- xn:
Float
%
Range
(1.0, None, inclusive_start=False)
No vote weight, corresponding to the parameter beta in the scoring function.[default:
8.0
]
Outputs¶
- chimeras:
FeatureData[Sequence]
The chimeric sequences.[required]
- nonchimeras:
FeatureData[Sequence]
The non-chimeric sequences.[required]
- stats:
UchimeStats
Summary statistics from chimera checking.[required]
vsearch fastq-stats¶
A fastq overview via vsearch's fastq_stats, fastq_eestats and fastq_eestats2 utilities. Please see https://
Citations¶
Inputs¶
- sequences:
SampleData[SequencesWithQuality | PairedEndSequencesWithQuality]
Fastq sequences[required]
Parameters¶
- threads:
Threads
The number of threads used for computation.[default:
1
]
Outputs¶
- visualization:
Visualization
<no description>[required]
vsearch cluster-features-open-reference¶
Given a feature table and the associated feature sequences, cluster the features against a reference database based on user-specified percent identity threshold of their sequences. Any sequences that don't match are then clustered de novo. This is not a general-purpose clustering method, but rather is intended to be used for clustering the results of quality-filtering/dereplication methods, such as DADA2, or for re-clustering a FeatureTable at a lower percent identity than it was originally clustered at. When a group of features in the input table are clustered into a single feature, the frequency of that single feature in a given sample is the sum of the frequencies of the features that were clustered in that sample. Feature identifiers will be inherited from the centroid feature of each cluster. For features that match a reference sequence, the centroid feature is that reference sequence, so its identifier will become the feature identifier. The clustered_sequences result will contain feature representative sequences that are derived from the sequences input for all features in clustered_table. This will always be the most abundant sequence in the cluster. The new_reference_sequences result will contain the entire reference database, plus feature representative sequences for any de novo features. This is intended to be used as a reference database in subsequent iterations of cluster_features_open_reference, if applicable. See the vsearch documentation for details on how sequence clustering is performed.
Citations¶
Rognes et al., 2016; Rideout et al., 2014
Inputs¶
- sequences:
FeatureData[Sequence]
The sequences corresponding to the features in table.[required]
- table:
FeatureTable[Frequency]
The feature table to be clustered.[required]
- reference_sequences:
FeatureData[Sequence]
The sequences to use as cluster centroids.[required]
Parameters¶
- perc_identity:
Float
%
Range
(0, 1, inclusive_start=False, inclusive_end=True)
The percent identity at which clustering should be performed. This parameter maps to vsearch's --id parameter.[required]
- strand:
Str
%
Choices
('plus', 'both')
Search plus (i.e., forward) or both (i.e., forward and reverse complement) strands.[default:
'plus'
]- threads:
Threads
The number of threads to use for computation. Passing 0 will launch one thread per CPU core.[default:
1
]
Outputs¶
- clustered_table:
FeatureTable[Frequency]
The table following clustering of features.[required]
- clustered_sequences:
FeatureData[Sequence]
Sequences representing clustered features.[required]
- new_reference_sequences:
FeatureData[Sequence]
The new reference sequences. This can be used for subsequent runs of open-reference clustering for consistent definitions of features across open-reference feature tables.[required]
- Links
- Documentation
- Source Code
- Stars
- 6
- Last Commit
- f376287
- Available Distros
- 2024.10
- 2024.10/amplicon
- 2024.10/metagenome
- 2024.5
- 2024.5/amplicon
- 2024.5/metagenome
- 2024.2
- 2024.2/amplicon
- 2023.9
- 2023.9/amplicon
- 2023.7
- 2023.7/core