This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
This is a guide for novice QIIME 2 users, and particularly for those who are new to microbiome research. For experienced users who are already well versed in microbiome analysis (and those who are averse to uncontrolled use of emoji) mosey on over to .
Welcome all newcomers! 👋 This guide will give you a conceptual overview of many of the plugins and actions available in QIIME 2, and guide you to relevant documentation for deeper exploration. As an Explanation article, this document doesn’t provide specific commands to run, but rather discusses at a higher level what your analysis workflow might entail. If you want specific commands that you can run and then adapt for your own work, our Tutorial articles are more aligned with what you’re looking for. We generally recommend starting with the Moving Pictures tutorial 🎥.
Consider this document to be your treasure map: QIIME 2 actions are the stepping stones on your path, and the flowcharts below will tell you where all the goodies are buried. 🗺️
Remember, many paths lead from the foot of the mountain, but at the peak we all gaze at the same moon. 🌕
Let’s get oriented¶
Flowcharts¶
Before we begin talking about specific plugins and actions, we will discuss a conceptual overview of a typical workflow for analyzing marker gene sequence data. And before we look at that overview, we must look at the key to our treasure map:

Figure 1:Each type of result (i.e., Artifacts and Visualizations) and action (i.e., methods, visualizers, and pipelines) is represented by a different color-coded node. The edges connecting each node are either solid (representing either required input or output) or dashed (representing optional input).
In the flowcharts below:
- Actions are labeled with the name of the plugin and the name of the action. To learn more about how to use a specific plugin and action, you can look it up in Available plugins.
- Artifacts are labeled by their artifact class.
- Visualizations are variously labeled as “visualization,” some name that represents the information shown in that visualization, or replaced with an image representing some of the tasty information you might find inside that visualization... 🍙
Useful points for beginners¶
Just a few more important points before we go further:
- The guide below is not exhaustive by any means.
It only covers some of the chief actions in the QIIME 2 amplicon distribution.
There are many more actions and plugins to discover.
Curious to learn more?
Refer to Available plugins, or if you’re working on the command line, call
qiime --help
. - The flowcharts below are designed to be as simple as possible, and hence omit many of the inputs (particularly optional inputs and metadata) and outputs (particularly statistical summaries and other minor outputs) and all of the possible parameters from most actions. Many additional actions (e.g., for displaying statistical summaries or fiddling with feature tables 🎻) are also omitted. Now that you know all about the help documentation (Available plugins), use it to learn more about individual actions, and other actions present in a plugin (hint: if a plugin has additional actions not described here, they are probably used to examine the output of other actions in that plugin).
- Metadata is a central concept in QIIME 2. We do not extensively discuss metadata in this guide. Instead, find discussion of metadata in .
- There is no one way to do things in QIIME 2. Nor is there a “QIIME 2” approach. Many paths lead from the foot of the mountain... ⛰️ Many of the plugins and actions in QIIME 2 wrap independent software or pre-existing methods. The QIIME 2 Framework (Q2F), discussed in Using QIIME 2, is the glue that makes the magic happen.
- Do not forget to cite appropriately! Unsure what to cite? To see the a plugin or method’s relevant citations, refer its help text. Or, better yet, view an artifact or visualization using QIIME 2 View. The “citations” tab will contain information on all relevant citations used for the generation of that file. Groovy. 😎
💃💃💃
Conceptual overview¶
Now let us examine a conceptual overview of the various possible workflows for examining marker gene sequence data Figure 2. QIIME 2 allows you to enter or exit anywhere you’d like, so you can use QIIME 2 for any or all of these steps.

Figure 2:Flowchart providing an overview of a typical QIIME 2-based microbiome marker gene analysis workflow. The edges and nodes in this overview do not represent specific actions or data types, but instead represent conceptual categories, e.g., the basic types of data or analytical goals we might have in an experiment. Discussion of these steps and terms follows.
All data must be imported into a QIIME 2 artifact to be used by a QIIME 2 action (with the exception of metadata).
Most users start with either multiplexed (e.g., between one and three FASTQ files) or demuliplexed (e.g., a collection of n
.fastq
files, where n
is the number of samples, or two-times the number of samples) raw sequence data.
If possible, we recommend starting with demultiplexed sequence data - this prevents you from having to understand how sequences were multiplexed and how they need to be demultiplexed.
Whoever did your sequencing should already have that information and know how to do this.
Others users may start downstream, because some data processing has already been performed. For example, you can also start your QIIME 2 analysis with a feature table (.biom
or .tsv
file) generated with some other tool.
How to import and export data helps you identify what type of data you have, and provides specific instructions on importing different types of data.
Now that we understand that we can actually enter into this overview workflow at nearly any of the nodes, let us walk through individual sections.
- All marker gene sequencing experiments begin, at some point or another, as multiplexed sequence data.
This is probably in
.fastq
files that containing DNA sequences and quality scores for each base. - The sequence data must be demultiplexed, such that each observed sequence read is associated with the sample that it was observed in, or discarded if its sample of origin could not be determined.
- Reads then undergo quality control (i.e., denoising), and amplicon sequence variants (ASVs) or operational taxonomic units (OTUs) should be defined. The goals of these steps are to remove sequencing errors and to dereplicate sequences to make downstream analyses more performant. These steps result in: a. a feature table that tabulates counts of ASVs (or OTUs) on a per-sample basis, and b. feature sequences - a mapping of ASV (or OTU) identifiers to the sequences they represent.
These artifacts (the feature table and feature sequences) are central to most downstream analyses. Common analyses include:
- Taxonomic annotation of sequences, which lets you determine with taxa (e.g., species, genera, phyla) are present.
- Alpha and beta diversity analyses, or measures of diversity within and between samples, respectively. These enable assessment of how similar or different samples are to one another. Some diversity metrics integrate measures of phylogenetic similarity between individual features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can construct a phylogenetic tree from your feature sequences to use when calculating phylogenetic diversity metrics.
- Differential abundance testing, to determine which features (OTUs, ASVs, taxa, etc) are significantly more/less abundant in different experimental groups.
This is just the beginning, and many other statistical tests and plotting methods are at your finger tips in QIIME 2 and in the lands beyond. The world is your oyster. Let’s dive in. 🏊
Demultiplexing¶
Okay! Imagine we have just received some FASTQ data, hot off the sequencing instrument. Most next-gen sequencing instruments have the capacity to analyze hundreds or even thousands of samples in a single lane/run; we do so by multiplexing these samples, which is just a fancy word for mixing a whole bunch of stuff together. How do we know which sample each read came from? This is typically done by appending a unique barcode (a.k.a. index or tag) sequence to one or both ends of each sequence. Detecting these barcode sequences and mapping them back to the samples they belong to allows us to demultiplex our sequences.
You (or whoever prepared and sequenced your samples) should know which barcode is associated with each sample -- if you do not know, talk to your lab mates or sequencing center. Include this barcode information in your sample metadata file.
The process of demultiplexing (as it occurs in QIIME 2) will look something like Figure 3 (ignore the right-hand side of this flow chart for now).

Figure 3:Flowchart of demultiplexing and denoising workflows in QIIME 2.
This flowchart describes all demultiplexing steps that are currently possible in QIIME 2, depending on the type of raw data you have imported.
Usually only one of the different demultiplexing actions available in q2-demux
or q2-cutadapt
will be applicable for your data, and that is all you will need.
Read more about demultiplexing and give it a spin with the Moving Pictures tutorial 🎥. That tutorials covers the Earth Microbiome Project format data.
If instead you have barcodes and primers in-line in your reads, see the [cutadapt tutorials](https://q2-cutadapt
.
Have dual-indexed reads or mixed-orientation reads or some other unusual format? Search the QIIME 2 Forum for advice.
Paired-end read joining¶
If you’re working with Illumina paired-end reads, they will typically need to be joined at some point in the analysis.
If you read How to merge Illumina paired-end reads, you will see that this happens automatically during denoising with q2-dada2
.
However, if you want to use q2-deblur
or an OTU clustering method (as described in more detail below), use q2-vsearch
to join these reads before proceeding, as shown in the Figure 3.
If you are beginning to pull your hair and foam at the mouth, do not despair: QIIME 2 tends to get easier the further we travel in the “general overview” (Figure 2). Importing and demultiplexing raw sequencing data happens to be the most frustrating part for most new users because there are so many different ways that marker gene data can be generated. But once you get the hang of it, it’s a piece of cake. 🍰
Denoising and clustering¶
Congratulations on getting this far! Denoising and clustering steps are slightly less confusing than importing and demultiplexing! 🎉😬🎉
The names for these steps are very descriptive:
- We denoise our sequences to remove and/or correct noisy reads. 🔊
- We dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps (don’t worry! we keep count of each replicate). 🕵️
- We (optionally) cluster sequences to collapse similar sequences (e.g., those that are ≥ 97% similar to each other) into single replicate sequences. This process, also known as OTU picking, was once a common procedure, used to simultaneously dereplicate but also perform a sort of quick-and-dirty denoising procedure (to capture stochastic sequencing and PCR errors, which should be rare and similar to more abundant centroid sequences). Skip clustering in favor of denoising, unless you have really strong reason not to.
Denoising¶
Let’s start with denoising, which is depicted on the right-hand side of Figure 3.
The denoising methods currently available in QIIME 2 include DADA2 and Deblur.
You can learn more about those methods by reading the original publications for each.
Examples of using both are presented in Moving Pictures tutorial 🎥.
Note that deblur (and also vsearch dereplicate-sequences
) should be preceded by basic quality-score-based filtering, but this is unnecessary for DADA2.
Both Deblur and DADA2 contain internal chimera checking methods and abundance filtering, so additional filtering should not be necessary following these methods.
🦁🐐🐍
To put it simply, these methods filter out noisy sequences, correct errors in marginal sequences (in the case of DADA2), remove chimeric sequences, remove singletons, join denoised paired-end reads (in the case of DADA2), and then dereplicate those sequences. 😎
The features produced by denoising methods go by many names, usually some variant of “sequence variant” (SV), “amplicon SV” (ASV), “actual SV”, “exact SV”... We tend to use amplicon sequence variant (ASV) in the QIIME 2 documentation, and we’ll stick with that here. 📏
Clustering¶
Next we will discuss clustering methods. Dereplication (the simplest clustering method, effectively producing 100% OTUs, i.e., all unique sequences observed in the dataset) is also depicted in Figure 4, and is the necessary starting point to all other clustering methods in QIIME 2 (Figure 4).

Figure 4:Flowchart of OTU clustering, chimera filtering, and abundance filtering workflows in QIIME 2.
q2-vsearch
implements three different OTU clustering strategies: de novo, closed reference, and open reference.
All should be preceded by basic quality-score-based filtering and followed by chimera filtering and aggressive OTU filtering (the treacherous trio, a.k.a. the Bokulich method).
🙈🙉🙊
demonstrates use of several q2-vsearch
clustering methods.
Don’t forget to read the chimera filtering tutorial as well.
The feature table¶
The final products of all denoising and clustering methods/workflows are a FeatureTable
(feature table) artifact and a FeatureData[Sequence]
(representative sequences) artifact.
These are two of the most important artifact classes in a marker gene sequencing workflow, and are used for many downstream analyses, as discussed below.
Indeed, feature tables are crucial to any QIIME 2 analysis, as the central record of the counts of features per sample.
Such an important artifact deserves its own powerful plugin:
q2-feature-table plugin documentation
feature-table¶
This is a QIIME 2 plugin supporting operations on sample by feature tables, such as filtering, merging, and transforming tables.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -feature -table - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
rarefy | method | Rarefy table |
subsample-ids | method | Subsample table |
presence-absence | method | Convert to presence/absence |
relative-frequency | method | Convert to relative frequencies |
transpose | method | Transpose a feature table. |
group | method | Group samples or features by a metadata column |
merge | method | Combine multiple tables |
merge-seqs | method | Combine collections of feature sequences |
merge-taxa | method | Combine collections of feature taxonomies |
rename-ids | method | Renames sample or feature ids in a table |
filter-samples | method | Filter samples from table |
filter-features-conditionally | method | Filter features from a table based on abundance and prevalence |
filter-features | method | Filter features from table |
filter-seqs | method | Filter features from sequences |
split | method | Split one feature table into many |
tabulate-feature-frequencies | method | Tabulate feature frequencies |
tabulate-sample-frequencies | method | Tabulate sample frequencies |
summarize | visualizer | Summarize table |
tabulate-seqs | visualizer | View sequence associated with each feature |
core-features | visualizer | Identify core features in table |
heatmap | visualizer | Generate a heatmap representation of a feature table |
summarize-plus | pipeline | Summarize table plus |
We will not discuss all actions of this plugin in detail here (some are mentioned below), but it performs many useful operations on feature tables so familiarize yourself with its documentation!
Congratulations! You’ve made it through importing, demultiplexing, and denoising/clustering your data, which are the most complicated and difficult steps for most users (if only because there are so many ways to do it!). If you’ve made it this far, the rest should be easy. Now begins the fun. 🍾
Taxonomy classification (or annotation) and taxonomic analyses¶
For many experiments, investigators aim to identify the organisms that are present in a sample. For example:
- How do the genera or species in a system change over time?
- Are there any potential human pathogens in this patient’s sample?
- What’s swimming in my wine? 🍷🤑
We can do this by comparing our feature sequences (be they ASVs or OTUs) to a reference database of sequences with known taxonomic composition. Simply finding the closest alignment is not really good enough -- because other sequences that are equally close matches or nearly as close may have different taxonomic annotations. So we use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus (which may not be a species name if one cannot be predicted with certainty!), based on alignment, k-mer frequencies, etc. Those interested in learning more about the relative performance of the taxonomy classifiers in QIIME 2 can read until the cows come home. And if you want to learn about how the algorithms work, you can refer to the Sequencing Homology Searching chapter of An Introduction to Applied Bioinformatics). 🐄🐄🐄
Figure 5 shows what a taxonomy classification workflow might look like.

Figure 5:Flowchart of taxonomic annotation workflows in QIIME 2.
Alignment-based taxonomic classification¶
q2-feature-classifier
contains three different classification methods.
classify-consensus-blast
and classify-consensus-vsearch
are both alignment-based methods that find a consensus assignment across N top hits.
These methods take reference database FeatureData[Taxonomy]
and FeatureData[Sequence]
files directly, and do not need to be pre-trained.
Machine-learning-based taxonomic classification¶
Machine-learning-based classification methods are available through classify-sklearn
, and theoretically can apply any of the classification methods available in scikit-learn.
These classifiers must be trained, e.g., to learn which features best distinguish each taxonomic group, adding an additional step to the classification process.
Classifier training is reference database- and marker-gene-specific and only needs to happen once per marker-gene/reference database combination; that classifier may then be re-used as many times as you like without needing to re-train!
Training your own feature classifiers.¶
If you’re working with an uncommon marker gene, you may need to train your own feature classifier.
This is possible following the steps in the classifier training tutorial.
The rescript
plugin also contains many tools that can be useful in preparing reference data for training classifiers.
Most users don’t need to train their own classifiers however, as the QIIME 2 developers provide classifiers to the public for common marker genes in the QIIME 2 Library.
🎅🎁🎅🎁🎅🎁
Environment-weighted classifiers¶
Typical Naive Bayes classifiers treat all reference sequences as being equally likely to be observed in a sample. Environment-weighted taxonomic classifiers, on the other hand, use public microbiome data to weight taxa by their past frequency of being observed in specific sample types. This can improve the accuracy and the resolution of marker gene classification, and we recommend using weighted classifiers when possible. You can find environment-weighted classifiers for 16S rRNA in the QIIME 2 Library. If the environment type that you’re studying isn’t one of the ones that pre-trained classifiers are provided for, the “diverse weighted” classifiers may still be relevant. These are trained on weights from multiple different environment types, and have been shown to perform better than classifiers that assume equal weights for all taxa.
Which feature classification method is best?¶
They are all pretty good, otherwise we wouldn’t bother exposing them in q2-feature-classifier
.
But in general classify-sklearn
with a Naive Bayes classifier can slightly outperform other methods we’ve tested based on several criteria for classification of 16S rRNA gene and fungal ITS sequences.
It can be more difficult and frustrating for some users, however, since it requires that additional training step.
That training step can be memory intensive, becoming a barrier for some users who are unable to use the pre-trained classifiers.
Some users also prefer the alignment-based methods because their mode of operation is much more transparent and their parameters easier to manipulate.
Feature classification can be slow¶
Runtime of feature classifiers is a function of the number of sequences to be classified, and the number of reference sequences. If runtime is an issue for you, considering filtering low-abundance features out of your sequences file before classifying (e.g., those that are present in only a single sample), and use smaller reference databases if possible. In practice, in “normal size” sequencing experiments (whatever that means 😜) we see variations between a few minutes (a few hundred features) to hours or days (hundreds of thousands of features) for classification to complete. If you want to hang some numbers on there, check out our benchmarks for classifier runtime performance. 🏃⏱️
Feature classification can be memory intensive¶
Generally at least 8 GB of RAM are required, though 16GB is better. The is generally related to the size of the reference database, and in some cases 32 GB of RAM or more are required.
Examples of using classify-sklearn
are shown in the Moving Pictures tutorial 🎥.
Figure 5 should make the other classifier methods reasonably clear.
All classifiers produce a FeatureData[Taxonomy]
artifact, tabulating the taxonomic annotation for each query sequence.
If you want to review those, or compare them across different classifiers, refer back to Reviewing information about observed sequences.
Taxonomic analysis¶
Taxonomic classification opens us up to a whole new world of possibilities. 🌎
Here are some popular actions that are enabled by having a FeatureData[Taxonomy]
artifact:
- Collapse your feature table with
taxa collapse
! This groups all features that share the same taxonomic assignment into a single feature. That taxonomic assignment becomes the feature ID in the new feature table. This feature table can be used in all the same ways as the original. Some users may be specifically interested in performing, e.g., taxonomy-based diversity analyses, but at the very least anyone assigning taxonomy is probably interested in assessing differential abundance of those taxa. Comparing differential abundance analyses using taxa as features versus using ASVs or OTUs as features can be diagnostic and informative for various analyses. - Plot your taxonomic composition to see the abundance of various taxa in each of your samples.
Check out
taxa barplot
andfeature-table heatmap
for more details. 📊 - Filter your feature table and feature sequences to remove certain taxonomic groups. This is useful for removing known contaminants or non-target groups, e.g., host DNA including mitochondrial or chloroplast sequences. It can also be useful for focusing on specific groups for deeper analysis. See the filtering tutorial for more details and examples. 🌿🐀
Sequence alignment and phylogenetic reconstruction¶
Some diversity metrics - notably Faith’s Phylogenetic Diversity (PD) and UniFrac - integrate the phylogenetic similarity of features. If you are sequencing phylogenetic markers (e.g., 16S rRNA genes), you can build a phylogenetic tree that can be used for computing these metrics.
The different options for aligning sequences and producing a phylogeny are shown in the flowchart below, and can be classified as de novo or reference-based. For a detailed discussion of alignment and phylogeny building, see the q2-phylogeny tutorial and q2-fragment-insertion. 🌳

Figure 6:Flowchart of alignment and phylogenetic reconstruction workflows in QIIME 2.
Now that we have our rooted phylogenetic tree (i.e., an artifact of class Phylogeny[Rooted]
), let’s use it!
Diversity analysis¶
In microbiome experiments, investigators frequently wonder about things like:
- How many different species/OTUs/ASVs are present in my samples?
- Which of my samples represent more phylogenetic diversity?
- Does the microbiome composition of my samples differ based on sample categories (e.g., healthy versus disease)?
- What factors (e.g., pH, elevation, blood pressure, body site, or host species just to name a few examples) are correlated with differences in microbial composition and biodiversity?
These questions can be answered by alpha- and beta-diversity analyses. Alpha diversity measures the level of diversity within individual samples. Beta diversity measures assess the dissimilarity between samples. We can then use this information to statistically test whether alpha diversity is different between groups of samples (indicating, for example, that those groups have more/less species richness) and whether beta diversity is greater across groups (indicating, for example, that samples within a group are more similar to each other than those in another group, suggesting that membership within these groups is shaping the microbial composition of those samples).
Different types of diversity analyses in QIIME 2 are exemplified in the Moving Pictures tutorial 🎥. The actions used to generate diversity artifacts are shown in Figure 7, and many other tools can operate on these results.

Figure 7:Flowchart of diversity analysis workflows in QIIME 2.
The q2-diversity
plugin contains many different useful actions.
Check them out to learn more.
As you can see in the flowchart, the diversity core-metrics*
pipelines (core-metrics
and core-metrics-phylogenetic
) encompass many different core diversity commands, and in the process produce the main diversity-related artifacts that can be used in downstream analyses.
These are:
SampleData[AlphaDiversity]
artifacts, which contain alpha diversity estimates for each sample in your feature table. This is the chief artifact for alpha diversity analyses.DistanceMatrix
artifacts, containing the pairwise distance/dissimilarity between each pair of samples in your feature table. This is the chief artifact for beta diversity analyses.PCoAResults
artifacts, containing principal coordinates ordination results for each distance/dissimilarity metric. Principal coordinates analysis is a dimension reduction technique, facilitating visual comparisons of sample (dis)simmilarities in 2D or 3D space. Learn more about ordination in Ordination Methods for Ecologists and in the Machine learning in bioinformatics section of An Introduction to Applied Bioinformatics](Bolyen et al. (2018)).
These are the main diversity-related artifacts.
We can re-use these data in all sorts of downstream analyses, or in the various actions of q2-diversity
shown in the flowchart.
Many of these actions are demonstrated in the Moving Pictures tutorial 🎥 so head on over there to learn more!
Note that there are many different alpha- and beta-diversity metrics that are available in QIIME 2. To learn more (and figure out whose paper you should be citing!), check out that neat resource, which was contributed by a friendly QIIME 2 user to enlighten all of us. Thanks Stephanie! 😁🙏😁🙏😁🙏
Fun with feature tables¶
At this point you have a feature table, taxonomy classification results, alpha diversity, and beta diversity results. Oh my! 🦁🐯🐻
Taxonomic and diversity analyses, as described above, are the basic types of analyses that most QIIME 2 users are probably going to need to perform at some point. However, this is only the beginning, and there are so many more advanced analyses at our fingertips. 🖐️⌨️

Figure 8:Flowchart of “downstream” analysis workflows in QIIME 2. **Note: This figure needs some updates. Specifically, gneiss was deprecated and is no longer part of the amplicon distribution.
We are only going to give a brief overview, since each of these analyses has its own in-depth tutorial to guide us:
Analyze longitudinal data:
q2-longitudinal
is a plugin for performing statistical analyses of longitudinal experiments, i.e., where samples are collected from individual patients/subjects/sites repeatedly over time. This includes longitudinal studies of alpha and beta diversity, and some really awesome, interactive plots. 📈🍝Predict the future (or the past) 🔮:
q2-sample-classifier
is a plugin for machine-learning 🤖 analyses of feature data. Both classification and regression models are supported. This allows you to do things like:predict sample metadata as a function of feature data (e.g., can we use a fecal sample to predict cancer susceptibility? Or predict wine quality based on the microbial composition of grapes before fermentation?). 🍇
identify features that are predictive of different sample characteristics. 🚀
quantify rates of microbial maturation (e.g., to track normal microbiome development in the infant gut and the impacts of persistent malnutrition or antibiotics, diet, and delivery mode). 👶
predict outliers and mislabeled samples. 👹
Differential abundance testing is used to determine which features are significantly more/less abundant in different groups of samples. QIIME 2 currently supports a few different approaches to differential abundance testing, including
ancom-bc
inq2-composition
. 👾👾👾Evaluate and control data quality:
q2-quality-control
is a plugin for evaluating and controlling sequence data quality. This includes actions that:test the accuracy of different bioinformatic or molecular methods, or of run-to-run quality variation. These actions are typically used if users have samples with known compositions, e.g., mock communities, since accuracy is calculated as the similarity between the observed and expected compositions, sequences, etc. But more creative uses may be possible...
filter sequences based on alignment to a reference database, or that contain specific short sections of DNA (e.g., primer sequences). This is useful for removing sequences that match a specific group of organisms, non-target DNA, or other nonsense. 🙃
And that’s just a brief overview! QIIME 2 continues to grow, so stay tuned for more plugins in future releases 📻, and keep your eyes peeled for stand-alone plugins that will continue to expand the functionality availability in QIIME 2.
A good next step is to work through the Moving Pictures tutorial 🎥, if you haven’t done so already. That will help you learn how to actually use all of the functionality discussed here on real microbiome sequence data.
Now go forth an have fun! 💃
- Links
- Documentation
- Source Code
- Stars
- 2
- Last Commit
- fd3001c
- Available Distros
- 2024.10
- 2024.10/amplicon
- 2024.10/metagenome
- 2024.10/pathogenome
- 2024.5
- 2024.5/amplicon
- 2024.5/metagenome
- 2024.2
- 2024.2/amplicon
- 2023.9
- 2023.9/amplicon
- 2023.7
- 2023.7/core