Optimize read truncation with truncee_rate
Source:R/vs_optimize_truncee_rate.R
vs_optimize_truncee_rate.Rd
vs_optimize_truncee_rate
optimizes the truncation
parameter truncee_rate
to achieve the best possible merging results.
The function iterates through a specified range of truncee_rate
values
to identify the optimal value that maximizes the proportion of high-quality
merged read pairs.
Arguments
- fastq_input
(Required). A FASTQ file path, FASTQ tibble (forward reads), or a paired-end tibble of class
"pe_df"
. See Details.- reverse
(Optional). A FASTQ file path or FASTQ tibble (reverse reads). Optional if
fastq_input
is a"pe_df"
object.- minovlen
(Optional). Minimum overlap between the merged reads. Must be at least 5. Defaults to
10
.- truncee_rate_range
(Optional). A numeric vector of
truncee_rate
values to test. Sequences are truncated so that their average expected error per base is lower than the specified value. Defaults to(0.002, 0.004, 0.006, 0.008, 0.010, 0.012, 0.014, 0.016, 0.018, 0.020, 0.022, 0.024, 0.026, 0.028, 0.030, 0.032, 0.034, 0.036, 0.038, 0.040)
.- minlen
(Optional). Minimum number of bases a sequence must have to be retained. Defaults to
0
. See Details.- min_size
(Optional). Minimum copy number (size) for a merged read to be included in the results. Defaults to
2
.- maxee_rate
(Optional). Threshold for average expected error. Must range from
0.0
to1.0
. Defaults to0.01
. See Details.- threads
(Optional). Number of computational threads to be used by
VSEARCH
. Defaults to1
.- plot_title
(Optional). If
TRUE
(default), a summary title will be displayed in the plot. Set toFALSE
for no title.- tmpdir
(Optional). Path to the directory where temporary files should be written when tables are used as input or output. Defaults to
NULL
, which resolves to the session-specific temporary directory (tempdir()
).
Value
A data frame with the following columns:
truncee_rate_value
: Testedtruncee_rate
value.merged_read_pairs
: Count of merged read-pairs with a copy number abovemin_size
after dereplication.R1_length
: Average length of R1-reads after trimming.R2_length
: Average length of R2-reads after trimming.
The returned data frame has an attribute named "plot"
containing a
ggplot2
object based on the returned data frame. The plot
visualizes truncee_rate
values against
merged_read_pairs
, R1_length
, and
R2_length
, with the optimal truncee_rate
value marked by a red
dashed line.
Details
The function uses vs_fastq_mergepairs
,
vs_fastx_trim_filt
, and vs_fastx_uniques
where
the arguments to this functions are described in detail.
If fastq_input
has class "pe_df"
, the reverse reads will be
automatically extracted from the "reverse"
attribute unless
explicitly provided in the reverse
argument.
The best possible truncation option (truncee_rate
) for merging is
measured by the number of merged read-pairs with a copy number above the
number specified by min_size
after dereplication.
Changing min_size
will affect the results. A low min_size
will
include merged sequences with a lower copy number after dereplication, and a
higher min_size
will filter out more reads and only count
high-frequency merged sequences.
Examples
if (FALSE) { # \dontrun{
# Define arguments
R1.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R1.fq")
R2.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R2.fq")
# Run optimizing function
optimize.tbl <- vs_optimize_truncee_rate(fastq_input = R1.file,
reverse = R2.file)
# Display plot
print(attr(optimize.tbl, "plot"))
} # }