vs_fastq_join
joins paired-end sequence reads into a
single sequence with a specified gap between them using VSEARCH
.
Usage
vs_fastq_join(
fastq_input,
reverse = NULL,
output_format = "fastq",
fastaout = NULL,
fastqout = NULL,
join_padgap = "NNNNNNNN",
join_padgapq = "IIIIIIII",
fasta_width = 0,
log_file = NULL,
threads = 1,
vsearch_options = NULL,
tmpdir = NULL
)
Arguments
- fastq_input
(Required). A FASTQ file path, a FASTQ tibble object (forward reads), or a paired-end tibble of class
"pe_df"
. See Details.- reverse
(Optional). A FASTQ file path or a FASTQ tibble object (reverse reads). Optional if
fastq_input
is a"pe_df"
object. See Details.- output_format
(Optional). Desired output format of the file or tibble:
"fasta"
or"fastq"
(default).- fastaout
(Optional). Name of the FASTA output file with the joined reads. If
NULL
(default), no output is written to a file. See Details.- fastqout
(Optional). Name of the FASTQ output file with the joined reads. If
NULL
(default), no output is written to a file. See Details.- join_padgap
(Optional). Padding sequence to use in the gap between the sequences. Defaults to
"NNNNNNNN"
.- join_padgapq
(Optional). Quality of the padding sequence. Defaults to
"IIIIIIII"
, corresponding to a base quality score of 40 (a very high quality score with error probability0.0001
).- fasta_width
(Optional). Number of characters per line in the output FASTA file. Only applies if the output file is in FASTA format. Defaults to
0
, which eliminates wrapping.- log_file
(Optional). Name of the log file to capture messages from
VSEARCH
. IfNULL
(default), no log file is created.- threads
(Optional). Number of computational threads to be used by
VSEARCH
. Defaults to1
.- vsearch_options
(Optional). Additional arguments to pass to
VSEARCH
. Defaults toNULL
. See Details.- tmpdir
(Optional). Path to the directory where temporary files should be written when tables are used as input or output. Defaults to
NULL
, which resolves to the session-specific temporary directory (tempdir()
).
Value
A tibble or NULL
.
If fastaout
or fastqout
is specified, the joined sequences are
written to the specified output file, and no tibble is returned.
If fastaout
or fastqout
is NULL
, a tibble containing the
joined reads in the format specified by output_format
is returned.
Details
Read pairs from the input FASTQ files (fastq_input
and reverse
)
are joined into a single sequence by adding a gap with a specified padding
sequence. The resulting sequences consist of the forward read, the padding
sequence, and the reverse complement of the reverse read.
fastq_input
and reverse
can either be file paths to FASTQ files
or FASTQ objects. FASTQ objects are tibbles that contain the columns
Header
, Sequence
, and Quality
, see
readFastq
. Forward and reverse reads must appear in the same
order and have the same total number of reads in both files.
If fastq_input
is an object of class "pe_df"
, the reverse reads
are automatically extracted from its "reverse"
attribute unless
explicitly provided via the reverse
argument. This simplifies function
calls when using paired-end tibbles created by functions such as
fastx_synchronize
or vs_fastx_trim_filt
.
If fastaout
or fastqout
is specified, the joined reads are
written to the respective file in either FASTA or FASTQ format.
If both fastaout
or fastqout
are NULL
, the results are
returned as a FASTA or FASTQ object, and no file is written.
output_format
must match the desired output files/objects.
Any input sequence with fewer bases than the value set in minlen
is
discarded. By default, minlen
is set to 0, which means that no
sequences are removed. However, using the default value may allow empty
sequences to remain in the results.
vsearch_options
allows users to pass additional command-line arguments
to VSEARCH
that are not directly supported by this function. Refer to
the VSEARCH
manual for more details.
Examples
if (FALSE) { # \dontrun{
# Define arguments
fastq_input <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R1.fq")
reverse <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R2.fq")
output_format <- "fastq"
# Execute joining and return a FASTQ tibble
join_seqs <- vs_fastq_join(fastq_input = fastq_input,
reverse = reverse,
output_format = output_format)
# Execute joining and write joined sequences to file
vs_fastq_join(fastq_input = fastq_input,
reverse = reverse,
fastqout = "joined_sequences.fq",
output_format = output_format)
} # }