vs_optimize_truncqual optimizes the truncation parameter
truncqual to achieve the best possible merging results. The function
iterates through a specified range of truncqual values to identify the
optimal value that maximizes the proportion of high-quality merged read pairs.
Usage
vs_optimize_truncqual(
fastq_input,
reverse = NULL,
minovlen = 10,
truncqual_range = 1:20,
minlen = 1,
min_size = 2,
maxee_rate = 0.01,
threads = 1,
plot_title = TRUE,
tmpdir = NULL
)Arguments
- fastq_input
(Required). A FASTQ file path, FASTQ tibble (forward reads), or a paired-end tibble of class
"pe_df". See Details.- reverse
(Optional). A FASTQ file path or FASTQ tibble (reverse reads). Optional if
fastq_inputis a"pe_df"object.- minovlen
(Optional). Minimum overlap between the merged reads. Must be at least 5. Defaults to
10.- truncqual_range
(Optional). A numeric vector of
truncqualvalues to test. Sequences are truncated starting from the first base with the specified base quality score or lower. Defaults to1:20.- minlen
(Optional). Minimum number of bases a sequence must have to be retained. Defaults to
0. See Details.- min_size
(Optional). Minimum copy number (size) for a merged read to be included in the results. Defaults to
2.- maxee_rate
(Optional). Threshold for average expected error. Must range from
0.0to1.0. Defaults to0.01. See Details.- threads
(Optional). Number of computational threads to be used by
VSEARCH. Defaults to1.- plot_title
(Optional). If
TRUE(default), a summary title will be displayed in the plot. Set toFALSEfor no title.- tmpdir
(Optional). Path to the directory where temporary files should be written when tables are used as input or output. Defaults to
NULL, which resolves to the session-specific temporary directory (tempdir()).
Value
A data frame with the following columns:
truncqual_value: Testedtruncqualvalue.merged_read_pairs: Count of merged read-pairs with a copy number abovemin_sizeafter dereplication.R1_length: Average length of R1-reads after trimming.R2_length: Average length of R2-reads after trimming.
The returned data frame has an attribute named "plot" containing a
ggplot2 object based on the returned data frame. The
plot visualizes truncqual values against merged_read_pairs,
R1_length, and R2_length, with the optimal truncqual
value marked by a red dashed line.
Additionally, the returned data frame has an attribute named
"optimal_truncqual" containing the optimal truncqual value.
Details
The function uses vs_fastq_mergepairs,
vs_fastx_trim_filt, and vs_fastx_uniques where
the arguments to this functions are described in detail.
If fastq_input has class "pe_df", the reverse reads will be
automatically extracted from the "reverse" attribute unless
explicitly provided in the reverse argument.
The best possible truncation option (truncqual) for merging is
measured by the number of merged read-pairs with a copy number above the
number specified by min_size after dereplication.
Changing min_size will affect the results. A low min_size will
include merged sequences with a lower copy number after dereplication, and a
higher min_size will filter out more reads and only count
high-frequency merged sequences.
Examples
if (FALSE) { # \dontrun{
# Define arguments
R1.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R1.fq")
R2.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R1.fq")
# Run optimizing function
optimize.tbl <- vs_optimize_truncqual(fastq_input = R1.file,
reverse = R2.file)
# Display plot
print(attr(optimize.tbl, "plot"))
} # }