vs_sintax
classifies sequences using the Sintax algorithm
implemented in VSEARCH
.
Usage
vs_sintax(
fasta_input,
database,
outfile = NULL,
cutoff = NULL,
strand = "plus",
randseed = NULL,
logfile = NULL,
threads = 1,
vsearch_options = NULL,
tmpdir = NULL
)
Arguments
- fasta_input
(Required). A FASTA file path or a FASTA object with reads to classify, see Details.
- database
(Required). A FASTA file path or a FASTA object containing the reference database in FASTA format. The sequences need to be annotated with taxonomy, see Details.
- outfile
(Optional). Name of the output file. If
NULL
(default), results are returned as a data.frame.- cutoff
(Optional). Minimum level of bootstrap support (0.0-1.0) for the classifications. Defaults to
0.0
.- strand
(Optional). Specifies which strand to consider when comparing sequences. Can be either
"plus"
(default) or"both"
.- randseed
(Optional). Seed for the random number generator used in the Sintax algorithm. Defaults to
NULL
.- logfile
(Optional). Name of the log file to capture messages from
VSEARCH
. IfNULL
(default), no log file is created.- threads
(Optional). Number of computational threads to be used by
VSEARCH
. Defaults to1
.- vsearch_options
(Optional). A character string of additional arguments to pass to
VSEARCH
. Defaults toNULL
. See Details.- tmpdir
(Optional). Path to the directory where temporary files should be written when tables are used as input or output. Defaults to
NULL
, which resolves to the session-specific temporary directory (tempdir()
).
Value
If outfile
is NULL
a data.frame is returned. If it
contains a file name (text) the data.frame is written to that file with
tab-separated columns.
The data.frame contains the classification results for each input sequence.
Both the Header
and Sequence
columns of fasta_input
are
copied into this table, and in addition are also the columns for each rank.
The ranks depend on the database file used, but are typically domain, phylum,
class, order,family, genus and species. For each classification is also a
bootstrap support score. These are in separate columns with corresponding
names, i.e. domain_score, phylum_score, etc.
Details
The sequences in the input file are classified according to the
Sintax algorithm, using VSEARCH
, see
https://www.biorxiv.org/content/10.1101/074161v1.
fasta_input
can either be a file path to a FASTA file or a
FASTA object. FASTA objects are tibbles that contain the columns
Header
and Sequence
, see readFasta
.
database
can either be a file path to a FASTA file or a
FASTA object. FASTA objects are tibbles that contain the
columns Header
and Sequence
, see readFasta
.
The Header
texts of this file must follow the sintax-pattern, see
make_sintax_db
.
vsearch_options
allows users to pass additional command-line arguments
to VSEARCH
that are not directly supported by this function. Refer to
the VSEARCH
manual for more details.
Examples
if (FALSE) { # \dontrun{
# Example files
db.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"sintax_db.fasta")
fasta.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small.fasta")
tax.tbl <- vs_sintax(fasta_input = fasta.file, database = db.file)
View(tax.tbl)
} # }