Skip to contents

vs_optimize_truncqual optimizes the truncation parameter truncqual to achieve the best possible merging results. The function iterates through a specified range of truncqual values to identify the optimal value that maximizes the proportion of high-quality merged read pairs.

Usage

vs_optimize_truncqual(
  fastq_input,
  reverse = NULL,
  minovlen = 10,
  truncqual_range = 1:20,
  minlen = 1,
  min_size = 2,
  maxee_rate = 0.01,
  threads = 1,
  plot_title = TRUE,
  tmpdir = NULL
)

Arguments

fastq_input

(Required). A FASTQ file path, FASTQ tibble (forward reads), or a paired-end tibble of class "pe_df". See Details.

reverse

(Optional). A FASTQ file path or FASTQ tibble (reverse reads). Optional if fastq_input is a "pe_df" object.

minovlen

(Optional). Minimum overlap between the merged reads. Must be at least 5. Defaults to 10.

truncqual_range

(Optional). A numeric vector of truncqual values to test. Sequences are truncated starting from the first base with the specified base quality score or lower. Defaults to 1:20.

minlen

(Optional). Minimum number of bases a sequence must have to be retained. Defaults to 0. See Details.

min_size

(Optional). Minimum copy number (size) for a merged read to be included in the results. Defaults to 2.

maxee_rate

(Optional). Threshold for average expected error. Must range from 0.0 to 1.0. Defaults to 0.01. See Details.

threads

(Optional). Number of computational threads to be used by VSEARCH. Defaults to 1.

plot_title

(Optional). If TRUE (default), a summary title will be displayed in the plot. Set to FALSE for no title.

tmpdir

(Optional). Path to the directory where temporary files should be written when tables are used as input or output. Defaults to NULL, which resolves to the session-specific temporary directory (tempdir()).

Value

A data frame with the following columns:

  • truncqual_value: Tested truncqual value.

  • merged_read_pairs: Count of merged read-pairs with a copy number above min_size after dereplication.

  • R1_length: Average length of R1-reads after trimming.

  • R2_length: Average length of R2-reads after trimming.

The returned data frame has an attribute named "plot" containing a ggplot2 object based on the returned data frame. The plot visualizes truncqual values against merged_read_pairs, R1_length, and R2_length, with the optimal truncqual value marked by a red dashed line.

Details

The function uses vs_fastq_mergepairs, vs_fastx_trim_filt, and vs_fastx_uniques where the arguments to this functions are described in detail.

If fastq_input has class "pe_df", the reverse reads will be automatically extracted from the "reverse" attribute unless explicitly provided in the reverse argument.

The best possible truncation option (truncqual) for merging is measured by the number of merged read-pairs with a copy number above the number specified by min_size after dereplication.

Changing min_size will affect the results. A low min_size will include merged sequences with a lower copy number after dereplication, and a higher min_size will filter out more reads and only count high-frequency merged sequences.

Examples

if (FALSE) { # \dontrun{
# Define arguments
R1.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
                     "small_R1.fq")
R2.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
                     "small_R1.fq")

# Run optimizing function
optimize.tbl <- vs_optimize_truncqual(fastq_input = R1.file,
                                      reverse = R2.file)

# Display plot
print(attr(optimize.tbl, "plot"))

} # }