This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
This QIIME 2 plugin uses cutadapt to work with adapters (e.g. barcodes, primers) in sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -cutadapt - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Martin, 2011
Actions¶
Name | Type | Short Description |
---|---|---|
trim-single | method | Find and remove adapters in demultiplexed single-end sequences. |
trim-paired | method | Find and remove adapters in demultiplexed paired-end sequences. |
demux-single | method | Demultiplex single-end sequence data with barcodes in-sequence. |
demux-paired | method | Demultiplex paired-end sequence data with barcodes in-sequence. |
cutadapt trim-single¶
Search demultiplexed single-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality]
The single-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read.[optional]- anywhere:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[SequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt trim-paired¶
Search demultiplexed paired-end sequences for adapters and remove them. The parameter descriptions in this method are adapted from the official cutadapt docs - please see those docs at https://
Citations¶
Inputs¶
- demultiplexed_sequences:
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed.[required]
Parameters¶
- cores:
Threads
Number of CPU cores to use.[default:
1
]- adapter_f:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in forward read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_f:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in forward read.[optional]- anywhere_f:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in forward read.[optional]- adapter_r:
List
[
Str
]
Sequence of an adapter ligated to the 3' end. The adapter and any subsequent bases are trimmed. If a
$
is appended, the adapter is only found if it is at the end of the read. Search in reverse read. If your sequence of interest is "framed" by a 5' and a 3' adapter, use this parameter to define a "linked" primer - see https://cutadapt .readthedocs .io for complete details.[optional] - front_r:
List
[
Str
]
Sequence of an adapter ligated to the 5' end. The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a
^
character is prepended, the adapter is only found if it is at the beginning of the read. Search in reverse read.[optional]- anywhere_r:
List
[
Str
]
Sequence of an adapter that may be ligated to the 5' or 3' end. Both types of matches as described under
adapter
andfront
are allowed. If the first base of the read is part of the match, the behavior is as withfront
, otherwise as withadapter
. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to. Search in reverse read.[optional]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
Maximum allowed error rate.[default:
0.1
]- indels:
Bool
Allow insertions or deletions of bases when matching adapters.[default:
True
]- times:
Int
%
Range
(1, None)
Remove multiple occurrences of an adapter if it is repeated, up to
times
times.[default:1
]- overlap:
Int
%
Range
(1, None)
Require at least
overlap
bases of overlap between read and adapter for an adapter to be found.[default:3
]- match_read_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in reads.[default:
False
]- match_adapter_wildcards:
Bool
Interpret IUPAC wildcards (e.g., N) in adapters.[default:
True
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- discard_untrimmed:
Bool
Discard reads in which no adapter was found.[default:
False
]- max_expected_errors:
Float
%
Range
(0, None)
Discard reads that exceed maximum expected erroneous nucleotides.[optional]
- max_n:
Float
%
Range
(0, None)
Discard reads with more than COUNT N bases. If COUNT_or_FRACTION is a number between 0 and 1, it is interpreted as a fraction of the read length.[optional]
- quality_cutoff_5end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 5 prime end.[default:
0
]- quality_cutoff_3end:
Int
%
Range
(0, None)
Trim nucleotides with Phred score quality lower than threshold from 3 prime end.[default:
0
]- quality_base:
Int
%
Range
(0, None)
How the Phred score is encoded (33 or 64).[default:
33
]
Outputs¶
- trimmed_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting trimmed sequences.[required]
cutadapt demux-single¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedSingleEndBarcodeInSequence
The single-end sequences to be demultiplexed.[required]
Parameters¶
- barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes.[required]
- cut:
Int
Remove the specified number of bases from the sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences.[default:
0
]- anchor_barcode:
Bool
Anchor the barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate. The default value specified by cutadapt is 0.1 (=10%), which is greater than
demux emp-*
, which is 0.0 (=0%).[default:0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[SequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedSingleEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
demux_single¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
qiime cutadapt demux-single \
--i-seqs seqs.qza \
--m-barcodes-file md.tsv \
--m-barcodes-column BarcodeSequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('BarcodeSequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -single /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-single
tool: - Set "seqs" to
#: seqs.qza
- For "barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
BarcodeSequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-single/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('BarcodeSequence')
action_results <- cutadapt_actions$demux_single(
seqs=seqs,
barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_single
cutadapt_demux_single(use)
cutadapt demux-paired¶
Demultiplex sequence data (i.e., map barcode reads to sample ids). Barcodes are expected to be located within the sequence data (versus the header, or a separate barcode file).
Citations¶
Inputs¶
- seqs:
MultiplexedPairedEndBarcodeInSequence
The paired-end sequences to be demultiplexed.[required]
Parameters¶
- forward_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the forward reads.[required]
- reverse_barcodes:
MetadataColumn
[
Categorical
]
The sample metadata column listing the per-sample barcodes for the reverse reads.[optional]
- forward_cut:
Int
Remove the specified number of bases from the forward sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- reverse_cut:
Int
Remove the specified number of bases from the reverse sequences. Bases are removed before demultiplexing. If a positive value is provided, bases are removed from the beginning of the sequences. If a negative value is provided, bases are removed from the end of the sequences. If --p-mixed-orientation is set, then both --p-forward-cut and --p-reverse-cut must be set to the same value.[default:
0
]- anchor_forward_barcode:
Bool
Anchor the forward barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the forward sequence. Can speed up demultiplexing if used.[default:
False
]- anchor_reverse_barcode:
Bool
Anchor the reverse barcode. The barcode is then expected to occur in full length at the beginning (5' end) of the reverse sequence. Can speed up demultiplexing if used.[default:
False
]- error_rate:
Float
%
Range
(0, 1, inclusive_end=True)
The level of error tolerance, specified as the maximum allowable error rate.[default:
0.1
]- batch_size:
Int
%
Range
(0, None)
The number of samples cutadapt demultiplexes concurrently. Demultiplexing in smaller batches will yield the same result with marginal speed loss, and may solve "too many files" errors related to sample quantity. Set to "0" to process all samples at once.[default:
0
]- minimum_length:
Int
%
Range
(1, None)
Discard reads shorter than specified value. Note, the cutadapt default of 0 has been overridden, because that value produces empty sequence records.[default:
1
]- mixed_orientation:
Bool
Handle demultiplexing of mixed orientation reads (i.e. when forward and reverse reads coexist in the same file).[default:
False
]- cores:
Threads
Number of CPU cores to use.[default:
1
]
Outputs¶
- per_sample_sequences:
SampleData[PairedEndSequencesWithQuality]
The resulting demultiplexed sequences.[required]
- untrimmed_sequences:
MultiplexedPairedEndBarcodeInSequence
The sequences that were unmatched to barcodes.[required]
Examples¶
paired¶
wget -O 'seqs.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
wget -O 'md.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
qiime cutadapt demux-paired \
--i-seqs seqs.qza \
--m-forward-barcodes-file md.tsv \
--m-forward-barcodes-column barcode-sequence \
--o-per-sample-sequences per-sample-sequences.qza \
--o-untrimmed-sequences untrimmed-sequences.qza
from qiime2 import Artifact
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.cutadapt.actions as cutadapt_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn = 'seqs.qza'
request.urlretrieve(url, fn)
seqs = Artifact.load(fn)
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn = 'md.tsv'
request.urlretrieve(url, fn)
md_md = Metadata.load(fn)
barcodes_mdc = md_md.get_column('barcode-sequence')
per_sample_sequences, untrimmed_sequences = cutadapt_actions.demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
seqs.qza
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /seqs .qza - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
md.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /cutadapt /demux -paired /1 /md .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 cutadapt demux-paired
tool: - Set "seqs" to
#: seqs.qza
- For "forward_barcodes":
- Leave as
Metadata from TSV
- Set "Metadata Source" to
md.tsv
- Set "Column Name" to
barcode-sequence
- Leave as
- Press the
Execute
button.
- Set "seqs" to
library(reticulate)
Artifact <- import("qiime2")$Artifact
Metadata <- import("qiime2")$Metadata
cutadapt_actions <- import("qiime2.plugins.cutadapt.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/seqs.qza'
fn <- 'seqs.qza'
request$urlretrieve(url, fn)
seqs <- Artifact$load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/cutadapt/demux-paired/1/md.tsv'
fn <- 'md.tsv'
request$urlretrieve(url, fn)
md_md <- Metadata$load(fn)
barcodes_mdc <- md_md$get_column('barcode-sequence')
action_results <- cutadapt_actions$demux_paired(
seqs=seqs,
forward_barcodes=barcodes_mdc,
)
per_sample_sequences <- action_results$per_sample_sequences
untrimmed_sequences <- action_results$untrimmed_sequences
from q2_cutadapt._examples import cutadapt_demux_paired
cutadapt_demux_paired(use)
- Links
- Documentation
- Source Code
- Stars
- 3
- Last Commit
- aec0799
- Available Distros
- 2024.10
- 2024.10/amplicon
- 2024.10/metagenome
- 2024.5
- 2024.5/amplicon
- 2024.5/metagenome
- 2024.2
- 2024.2/amplicon
- 2023.9
- 2023.9/amplicon
- 2023.7
- 2023.7/core