fastx_synchronize
synchronizes sequences between two
FASTA/FASTQ files or objects by retaining only the common sequences present
in both.
Usage
fastx_synchronize(
file1,
file2 = NULL,
file_format = "fastq",
file1_out = NULL,
file2_out = NULL
)
Arguments
- file1
(Required). A FASTQ file path, a FASTQ tibble, or a paired-end tibble of class
"pe_df"
. See Details.- file2
(Optional). A FASTQ file path or a FASTQ tibble. Optional if
file1
is a"pe_df"
object. See Details.- file_format
(Optional). Format of the input (
file1
andfile2
) and the desired output format:"fasta"
or"fastq"
(default). This determines the format for both outputs.- file1_out
(Optional). Name of the output file for synchronized reads from
file1
. The file is in either FASTA or FASTQ format, depending onfile_format
. IfNULL
(default), no sequences are written to a file. See Details.- file2_out
(Optional). Name of the output file for synchronized reads from
file2
. The file is in either FASTA or FASTQ format, depending onfile_format
. IfNULL
(default), no sequences are written to a file. See Details.
Value
A tibble or NULL
.
If both file1_out
and file2_out
are NULL
, a tibble
containing the synchronized reads from file1
is returned. The
synchronized reads from file2
are accessible via the "reverse"
attribute of the returned tibble.
If both file1_out
and file2_out
are specified, the synchronized
sequences are written to the specified output files, and no tibble is
returned.
Details
file1
and file2
can either be paths to FASTA/FASTQ files or
tibble objects containing the sequences.
FASTA objects are tibbles that contain the columns Header
and
Sequence
, see readFasta
. FASTQ objects are tibbles that
contain the columns Header
, Sequence
, and Quality
, see
readFastq
.
If file1
is an object of class "pe_df"
, the second read tibble
is automatically extracted from its "reverse"
attribute unless
explicitly provided via the file2
argument. This allows streamlined
input handling for paired-end tibbles created by
vs_fastx_trim_filt
.
Sequence IDs in the Header
fields must be identical for each read pair
in both file1
and file2
for synchronization to work correctly.
If file1_out
and file2_out
are specified, the synchronized
sequences are written to these files in the format specified by
file_format
.
If file1_out
and file2_out
are NULL
, the function
returns a FASTA/FASTQ object containing synchronized reads from file1
.
The synchronized reads from file2
are included as an attribute named
"reverse"
in the returned tibble.
The returned tibble is assigned the S3 class "pe_df"
, indicating that
it represents paired-end sequence data. Downstream functions can use this
class tag to distinguish paired-end tibbles from other tibbles.
Both file1_out
and file2_out
must either be NULL
or both
must be character strings specifying the file paths.
Examples
if (FALSE) { # \dontrun{
# Define arguments
file1 <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R1.fq")
file2 <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_R2.fq")
file_format <- "fastq"
file1_out <- NULL
file2_out <- NULL
# Synchronize files and return as a tibble
sync_seqs <- fastx_synchronize(file1 = file1,
file2 = file2,
file_format = file_format,
file1_out = file1_out,
file2_out = file2_out)
# Extract tibbles with synchronized sequences
R1_sync <- sync_seqs
R2_sync <- attr(sync_seqs, "reverse")
# Synchronize files and write to output files
fastx_synchronize(file1 = file1,
file2 = file2,
file_format = file_format,
file1_out = "synchronized_R1.fastq",
file2_out = "synchronized_R2.fastq")
} # }