IDTAXA

 

Now, for the grand finale: we’ll use all previous knowledge we’ve gathered, and study an fully developed microbiology machine learning package in R: IDTAXA.

IDTAXA is part of the DECIPHER package, a package hosted on the bioconductor website. DECIPHER is a “software toolset that can be used for deciphering and managing biological sequences efficiently using R”. It contains functions for many different uses, including maintaining databases, aligning sequences, finding genes, and the main use we’ll be studying, analysing sequences for classification. (Murali, Bhargava, and Wright 2018)

IDTAXA contains 2 forms of identification: Taxonomy by organism, and taxonomy by functions. ITDAXA: classify organisms takes rRNA or an ITS sequence, and classifies it as a taxonomy of organisms. IDTAXA: classify functions takes a protein or coding sequence, and classifies it as a taxonomy of functions.

The site for IDTAXA, specifically this link here, contains a guide on how to perform an IDTAXA identification yourself in R. This guide will be performed on an A: pre-assembled mock data set, B: 3 16sRNA sequences from taken from an online database, and C: 16sRNA sequencces from a random published study.

The process goes as follows: First, store the location of the fasta data, and read it using readDNAStringSet(or RNAStringSet if it’s RNA data). Use “remove gaps” to remove any possible gaps found in the data

fas<-"./data.raw/HMP_16S.fas.txt"

seqs<-readDNAStringSet(fas)

seqs<-RemoveGaps(seqs)

Then, load in the training data set downloaded from the DECIPHER website.

(Please keep in mind, loading this data and performing the machine learning is a very CPU-heavy process. In order to make this .Rmd accessible to those without a strong computer, I’ve instead ran the code myself, and will only be showing the plots as output)

load("./data.raw/Contax_v1_March2018.RData")

Now, let the IdTaxa algorythm run, store the output, and use plot() to immdiately plot the output

ids<-IdTaxa(seqs,
            trainingSet,
            strand = "both",
            threshold = 60,
            processors = NULL)
## 
  |                                                                                                                                                             
  |                                                                                                                                                       |   0%
  |                                                                                                                                                             
  |=                                                                                                                                                      |   1%
  |                                                                                                                                                             
  |==                                                                                                                                                     |   1%
  |                                                                                                                                                             
  |===                                                                                                                                                    |   2%
  |                                                                                                                                                             
  |====                                                                                                                                                   |   3%
  |                                                                                                                                                             
  |=====                                                                                                                                                  |   4%
  |                                                                                                                                                             
  |======                                                                                                                                                 |   4%
  |                                                                                                                                                             
  |=======                                                                                                                                                |   5%
  |                                                                                                                                                             
  |=========                                                                                                                                              |   6%
  |                                                                                                                                                             
  |==========                                                                                                                                             |   6%
  |                                                                                                                                                             
  |===========                                                                                                                                            |   7%
  |                                                                                                                                                             
  |============                                                                                                                                           |   8%
  |                                                                                                                                                             
  |=============                                                                                                                                          |   8%
  |                                                                                                                                                             
  |==============                                                                                                                                         |   9%
  |                                                                                                                                                             
  |===============                                                                                                                                        |  10%
  |                                                                                                                                                             
  |================                                                                                                                                       |  11%
  |                                                                                                                                                             
  |=================                                                                                                                                      |  11%
  |                                                                                                                                                             
  |==================                                                                                                                                     |  12%
  |                                                                                                                                                             
  |===================                                                                                                                                    |  13%
  |                                                                                                                                                             
  |====================                                                                                                                                   |  13%
  |                                                                                                                                                             
  |=====================                                                                                                                                  |  14%
  |                                                                                                                                                             
  |======================                                                                                                                                 |  15%
  |                                                                                                                                                             
  |=======================                                                                                                                                |  15%
  |                                                                                                                                                             
  |========================                                                                                                                               |  16%
  |                                                                                                                                                             
  |==========================                                                                                                                             |  17%
  |                                                                                                                                                             
  |===========================                                                                                                                            |  18%
  |                                                                                                                                                             
  |============================                                                                                                                           |  18%
  |                                                                                                                                                             
  |=============================                                                                                                                          |  19%
  |                                                                                                                                                             
  |==============================                                                                                                                         |  20%
  |                                                                                                                                                             
  |===============================                                                                                                                        |  20%
  |                                                                                                                                                             
  |================================                                                                                                                       |  21%
  |                                                                                                                                                             
  |=================================                                                                                                                      |  22%
  |                                                                                                                                                             
  |==================================                                                                                                                     |  23%
  |                                                                                                                                                             
  |===================================                                                                                                                    |  23%
  |                                                                                                                                                             
  |====================================                                                                                                                   |  24%
  |                                                                                                                                                             
  |=====================================                                                                                                                  |  25%
  |                                                                                                                                                             
  |======================================                                                                                                                 |  25%
  |                                                                                                                                                             
  |=======================================                                                                                                                |  26%
  |                                                                                                                                                             
  |========================================                                                                                                               |  27%
  |                                                                                                                                                             
  |=========================================                                                                                                              |  27%
  |                                                                                                                                                             
  |===========================================                                                                                                            |  28%
  |                                                                                                                                                             
  |============================================                                                                                                           |  29%
  |                                                                                                                                                             
  |=============================================                                                                                                          |  30%
  |                                                                                                                                                             
  |==============================================                                                                                                         |  30%
  |                                                                                                                                                             
  |===============================================                                                                                                        |  31%
  |                                                                                                                                                             
  |================================================                                                                                                       |  32%
  |                                                                                                                                                             
  |=================================================                                                                                                      |  32%
  |                                                                                                                                                             
  |==================================================                                                                                                     |  33%
  |                                                                                                                                                             
  |===================================================                                                                                                    |  34%
  |                                                                                                                                                             
  |====================================================                                                                                                   |  35%
  |                                                                                                                                                             
  |=====================================================                                                                                                  |  35%
  |                                                                                                                                                             
  |======================================================                                                                                                 |  36%
  |                                                                                                                                                             
  |=======================================================                                                                                                |  37%
  |                                                                                                                                                             
  |========================================================                                                                                               |  37%
  |                                                                                                                                                             
  |=========================================================                                                                                              |  38%
  |                                                                                                                                                             
  |==========================================================                                                                                             |  39%
  |                                                                                                                                                             
  |============================================================                                                                                           |  39%
  |                                                                                                                                                             
  |=============================================================                                                                                          |  40%
  |                                                                                                                                                             
  |==============================================================                                                                                         |  41%
  |                                                                                                                                                             
  |===============================================================                                                                                        |  42%
  |                                                                                                                                                             
  |================================================================                                                                                       |  42%
  |                                                                                                                                                             
  |=================================================================                                                                                      |  43%
  |                                                                                                                                                             
  |==================================================================                                                                                     |  44%
  |                                                                                                                                                             
  |===================================================================                                                                                    |  44%
  |                                                                                                                                                             
  |====================================================================                                                                                   |  45%
  |                                                                                                                                                             
  |=====================================================================                                                                                  |  46%
  |                                                                                                                                                             
  |======================================================================                                                                                 |  46%
  |                                                                                                                                                             
  |=======================================================================                                                                                |  47%
  |                                                                                                                                                             
  |========================================================================                                                                               |  48%
  |                                                                                                                                                             
  |=========================================================================                                                                              |  49%
  |                                                                                                                                                             
  |==========================================================================                                                                             |  49%
  |                                                                                                                                                             
  |============================================================================                                                                           |  50%
  |                                                                                                                                                             
  |=============================================================================                                                                          |  51%
  |                                                                                                                                                             
  |==============================================================================                                                                         |  51%
  |                                                                                                                                                             
  |===============================================================================                                                                        |  52%
  |                                                                                                                                                             
  |================================================================================                                                                       |  53%
  |                                                                                                                                                             
  |=================================================================================                                                                      |  54%
  |                                                                                                                                                             
  |==================================================================================                                                                     |  54%
  |                                                                                                                                                             
  |===================================================================================                                                                    |  55%
  |                                                                                                                                                             
  |====================================================================================                                                                   |  56%
  |                                                                                                                                                             
  |=====================================================================================                                                                  |  56%
  |                                                                                                                                                             
  |======================================================================================                                                                 |  57%
  |                                                                                                                                                             
  |=======================================================================================                                                                |  58%
  |                                                                                                                                                             
  |========================================================================================                                                               |  58%
  |                                                                                                                                                             
  |=========================================================================================                                                              |  59%
  |                                                                                                                                                             
  |==========================================================================================                                                             |  60%
  |                                                                                                                                                             
  |===========================================================================================                                                            |  61%
  |                                                                                                                                                             
  |=============================================================================================                                                          |  61%
  |                                                                                                                                                             
  |==============================================================================================                                                         |  62%
  |                                                                                                                                                             
  |===============================================================================================                                                        |  63%
  |                                                                                                                                                             
  |================================================================================================                                                       |  63%
  |                                                                                                                                                             
  |=================================================================================================                                                      |  64%
  |                                                                                                                                                             
  |==================================================================================================                                                     |  65%
  |                                                                                                                                                             
  |===================================================================================================                                                    |  65%
  |                                                                                                                                                             
  |====================================================================================================                                                   |  66%
  |                                                                                                                                                             
  |=====================================================================================================                                                  |  67%
  |                                                                                                                                                             
  |======================================================================================================                                                 |  68%
  |                                                                                                                                                             
  |=======================================================================================================                                                |  68%
  |                                                                                                                                                             
  |========================================================================================================                                               |  69%
  |                                                                                                                                                             
  |=========================================================================================================                                              |  70%
  |                                                                                                                                                             
  |==========================================================================================================                                             |  70%
  |                                                                                                                                                             
  |===========================================================================================================                                            |  71%
  |                                                                                                                                                             
  |============================================================================================================                                           |  72%
  |                                                                                                                                                             
  |==============================================================================================================                                         |  73%
  |                                                                                                                                                             
  |===============================================================================================================                                        |  73%
  |                                                                                                                                                             
  |================================================================================================================                                       |  74%
  |                                                                                                                                                             
  |=================================================================================================================                                      |  75%
  |                                                                                                                                                             
  |==================================================================================================================                                     |  75%
  |                                                                                                                                                             
  |===================================================================================================================                                    |  76%
  |                                                                                                                                                             
  |====================================================================================================================                                   |  77%
  |                                                                                                                                                             
  |=====================================================================================================================                                  |  77%
  |                                                                                                                                                             
  |======================================================================================================================                                 |  78%
  |                                                                                                                                                             
  |=======================================================================================================================                                |  79%
  |                                                                                                                                                             
  |========================================================================================================================                               |  80%
  |                                                                                                                                                             
  |=========================================================================================================================                              |  80%
  |                                                                                                                                                             
  |==========================================================================================================================                             |  81%
  |                                                                                                                                                             
  |===========================================================================================================================                            |  82%
  |                                                                                                                                                             
  |============================================================================================================================                           |  82%
  |                                                                                                                                                             
  |=============================================================================================================================                          |  83%
  |                                                                                                                                                             
  |===============================================================================================================================                        |  84%
  |                                                                                                                                                             
  |================================================================================================================================                       |  85%
  |                                                                                                                                                             
  |=================================================================================================================================                      |  85%
  |                                                                                                                                                             
  |==================================================================================================================================                     |  86%
  |                                                                                                                                                             
  |===================================================================================================================================                    |  87%
  |                                                                                                                                                             
  |====================================================================================================================================                   |  87%
  |                                                                                                                                                             
  |=====================================================================================================================================                  |  88%
  |                                                                                                                                                             
  |======================================================================================================================================                 |  89%
  |                                                                                                                                                             
  |=======================================================================================================================================                |  89%
  |                                                                                                                                                             
  |========================================================================================================================================               |  90%
  |                                                                                                                                                             
  |=========================================================================================================================================              |  91%
  |                                                                                                                                                             
  |==========================================================================================================================================             |  92%
  |                                                                                                                                                             
  |===========================================================================================================================================            |  92%
  |                                                                                                                                                             
  |============================================================================================================================================           |  93%
  |                                                                                                                                                             
  |=============================================================================================================================================          |  94%
  |                                                                                                                                                             
  |==============================================================================================================================================         |  94%
  |                                                                                                                                                             
  |================================================================================================================================================       |  95%
  |                                                                                                                                                             
  |=================================================================================================================================================      |  96%
  |                                                                                                                                                             
  |==================================================================================================================================================     |  96%
  |                                                                                                                                                             
  |===================================================================================================================================================    |  97%
  |                                                                                                                                                             
  |====================================================================================================================================================   |  98%
  |                                                                                                                                                             
  |=====================================================================================================================================================  |  99%
  |                                                                                                                                                             
  |====================================================================================================================================================== |  99%
  |                                                                                                                                                             
  |=======================================================================================================================================================| 100%
## 
## Time difference of 8.44 secs
print(ids)
##   A test set of class 'Taxa' with length 118
##       confidence name                 taxon
##   [1]        77% Acinetobacter bau... Root; Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter                         
##   [2]        77% Acinetobacter bau... Root; Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter                         
##   [3]        77% Acinetobacter bau... Root; Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter                         
##   [4]        77% Acinetobacter bau... Root; Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter                         
##   [5]        77% Acinetobacter bau... Root; Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter                         
##   ...        ... ...                  ...
## [114]        58% Streptococcus mut... Root; unclassified_Root                                                                                                    
## [115]        81% Streptococcus pne... Root; Bacteria; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; Streptococcus                                      
## [116]        81% Streptococcus pne... Root; Bacteria; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; Streptococcus                                      
## [117]        81% Streptococcus pne... Root; Bacteria; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; Streptococcus                                      
## [118]        81% Streptococcus pne... Root; Bacteria; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; Streptococcus
plot(ids)

And easy as that, you’ve used IdTaxa machine learning to identify the given 16s RNA sequences.

That was for a training dataset that was already pre-assembled. But what if you’ve got a bunch of individual 16s RNA strands, of which you want to identify all of them? For that, we’ll need to manually take some 16s RNA sequences, and put them together into a single file via R.

I’ve taken 3 random bacterial DNA sequences:

fas1<-"./data.raw/AB002521.1.fasta"
fas2<-"./data.raw/AB243005.1.fasta"
fas3<-"./data.raw/AB594754.1.fasta"

seqs<-readDNAStringSet(c(fas1, fas2, fas3))

seqs<-RemoveGaps(seqs)

ids<-IdTaxa(seqs,
            trainingSet,
            strand = "both",
            threshold = 60,
            processors = NULL)
## 
  |                                                                                                                                                             
  |                                                                                                                                                       |   0%
  |                                                                                                                                                             
  |=========================                                                                                                                              |  17%
  |                                                                                                                                                             
  |==================================================                                                                                                     |  33%
  |                                                                                                                                                             
  |============================================================================                                                                           |  50%
  |                                                                                                                                                             
  |=====================================================================================================                                                  |  67%
  |                                                                                                                                                             
  |==============================================================================================================================                         |  83%
  |                                                                                                                                                             
  |=======================================================================================================================================================| 100%
## 
## Time difference of 0.84 secs
print(ids)
##   A test set of class 'Taxa' with length 3
##     confidence name                 taxon
## [1]        80% ENA|AB002521|AB00... Root; Bacteria; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; Streptococcus                                        
## [2]        81% ENA|AB243005|AB24... Root; Bacteria; Firmicutes; Bacilli; Bacillales; Staphylococcaceae; Staphylococcus                                           
## [3]        85% ENA|AB594754|AB59... Root; Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Salmonella
plot(ids)

References

Murali, Adithya, Aniruddha Bhargava, and Erik S. Wright. 2018. IDTAXA: A Novel Approach for Accurate Taxonomic Classification of Microbiome Sequences.” Microbiome 6 (1): 140. https://doi.org/10.1186/s40168-018-0521-5.