Detect chimeras without external references (i.e. de novo)
Source:R/vs_uchime_denovo.R
vs_uchime_denovo.Rd
vs_uchime_denovo
detects chimeras present in the FASTA
sequences in using VSEARCH
's uchime_denovo
algorithm.
Automatically sorts sequences by decreasing abundance to enhance chimera
detection accuracy.
Usage
vs_uchime_denovo(
fasta_input,
nonchimeras = NULL,
chimeras = NULL,
sizein = TRUE,
sizeout = TRUE,
relabel = NULL,
relabel_sha1 = FALSE,
fasta_width = 0,
sample = NULL,
log_file = NULL,
vsearch_options = NULL,
tmpdir = NULL
)
Arguments
- fasta_input
(Required). A FASTA file path or a FASTA object with reads. See Details.
- nonchimeras
(Optional). Name of the FASTA output file for the non-chimeric sequences. If
NULL
(default), no output is written to file.- chimeras
(Optional). Name of the FASTA output file for the chimeric sequences. If
NULL
(default), no output is written to file.- sizein
(Optional). If
TRUE
(default), abundance annotations present in sequence headers are taken into account.- sizeout
(Optional). If
TRUE
(default), abundance annotations are added to FASTA headers.- relabel
(Optional). Relabel sequences using the given prefix and a ticker to construct new headers. Defaults to
NULL
.- relabel_sha1
(Optional). If
TRUE
(default), relabel sequences using the SHA1 message digest algorithm. Defaults toFALSE
.- fasta_width
(Optional). Number of characters per line in the output FASTA file. Defaults to
0
, which eliminates wrapping.- sample
(Optional). Add the given sample identifier string to sequence headers. For instance, if the given string is "ABC", the text ";sample=ABC" will be added to the header. If
NULL
(default), no identifier is added.- log_file
(Optional). Name of the log file to capture messages from
VSEARCH
. IfNULL
(default), no log file is created.- vsearch_options
(Optional). Additional arguments to pass to
VSEARCH
. Defaults toNULL
. See Details.- tmpdir
(Optional). Path to the directory where temporary files should be written when tables are used as input or output. Defaults to
NULL
, which resolves to the session-specific temporary directory (tempdir()
).
Value
A tibble or NULL
.
If nonchimeras
and chimeras
are specified, the resulting
sequences after chimera detection written directly to the specified files in
FASTA format, and no tibbles are returned.
If nonchimeras
and chimeras
are NULL
, A FASTA object
containing non-chimeric sequences with an attribute "chimeras"
containing a tibble of chimeric sequences is returned. If no chimeras are
found, the "chimeras"
attribute is an empty data frame.
Additionally, the returned tibble (when applicable) has an attribute
"statistics"
containing a tibble with chimera detection statistics.
The statistics tibble has the following columns:
num_nucleotides
: Total number of nucleotides used as input for chimera detection.num_sequences
: Total number of sequences used as input for chimera detection.min_length_input_seq
: Length of the shortest sequence used as input for chimera detection.max_length_input_seq
: Length of the longest sequence used as input for chimera detection.avg_length_input_seq
: Average length of the sequences used as input for chimera detection.num_non_chimeras
: Number of non-chimeric sequences.num_chimeras
: Number of chimeric sequences.input
: Name of the input file/object for the chimera detection.
Details
Chimeras in the input FASTA sequences are detected using VSEARCH
´s
uchime_denovo
. In de novo mode, input FASTA file/object must present
abundance annotations (i.e. a pattern [;]size=integer[;] in the header).
Input order matters for chimera detection, so it is recommended to sort
sequences by decreasing abundance.
fasta_input
can either be a FASTA file or a FASTA object. FASTA objects
are tibbles that contain the columns Header
and Sequence
, see
readFasta
.
If nonchimeras
and chimeras
are specified, resulting
non-chimeric and chimeric sequences are written to these files in FASTA
format.
If nonchimeras
and chimeras
are NULL
, results are
returned as a FASTA-objects.
nonchimeras
and chimeras
must either both be specified or both
be NULL
.
vsearch_options
allows users to pass additional command-line arguments
to VSEARCH
that are not directly supported by this function. Refer to
the VSEARCH
manual for more details.
Examples
if (FALSE) { # \dontrun{
# Define arguments
fasta_input <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R1.fq")
nonchimeras <- "nonchimeras.fa"
chimeras <- "chimeras.fa"
# Detect chimeras with default parameters and return FASTA files
vs_uchime_denovo(fasta_input = fasta_input,
nonchimeras = nonchimeras,
chimeras = chimeras)
# Detect chimeras with default parameters and return a FASTA tibble
nonchimeras.tbl <- vs_uchime_denovo(fasta_input = fasta_input,
nonchimeras = NULL,
chimeras = NULL)
# Get chimeras tibble
chimeras.tbl <- attr(nonchimeras.tbl, "chimeras")
# Get statistics tibble
statistics.tbl <- attr(nonchimeras.tbl, "statistics")
} # }