rsearch_obj
standardizes and organizes data into an
Rsearch object. An Rsearch object is a list containing three elements with
data structures that can be used as input to build a phyloseq
object in the
phyloseq
package.
Arguments
- readcount_data
(Required). A file path or a data frame (or tibble) containing OTU count data, typically the output from
vs_cluster_size
or similar. This must have one row per OTU and one column per sample. The first column must contain OTU identifiers corresponding to those in the first column ofsequence_data
, and the remaining columns must have names matching the sample identifiers insample_data
. OTUs and samples not found across all data structures are discarded.- sequence_data
(Required). A file path or a data frame (or tibble) containing centroid sequences representing each OTU, typically obtained from clustering (
vs_cluster_size
) or denoising (vs_cluster_unoise
). The first column must be calledHeader
and contain OTU identifiers. One of the remaining columns must be namedSequence
, containing the actual DNA sequences. Additional columns may include taxonomic classification data, e.g. fromvs_sintax
.- sample_data
(Required). A file path or a data frame (or tibble) containing metadata about each sample. Samples are assumed to be in rows, and one of the columns must contain a unique identifier for each sample that matches the column names in
readcount_data
.- sample_id_col
(Optional). A character string specifying the name of the column in
sample_data
that contains the unique sample identifiers. This column will be used to match sample metadata to read count data. Defaults to"sample_id"
.
Value
A straightforward named list with three elements:
readcount.mat
: A numeric matrix of OTU abundances with OTUs as rows and samples as columns.sequence.df
: A data.frame with one row for each OTU sequence andsampledata.df
: A data frame containing data about the samples.
Details
This function standardizes and organizes data into an Rsearch object: a structured three key data components used or generated during the Rsearch workflow: read count data, sequence data, and sample data.
The function accepts three datasets—read count data, sequence data,
and sample metadata, and returns a streamlined input
suitable for constructing a phyloseq object using the
rsearch2phyloseq
function. The implementation uses a
standard list
in R rather than a specialized class providing an open
and easily accessible structure.
To convert this object into a phyloseq
object, use
rsearch2phyloseq
.
Examples
if (FALSE) { # \dontrun{
# Define inputs
readcount.dta <- file.path(file.path(path.package("Rsearch"), "extdata"),
"readcount_data.tsv")
sequence.dta <- file.path(file.path(path.package("Rsearch"), "extdata"),
"sequence_data.tsv")
sample.dta <- file.path(file.path(path.package("Rsearch"), "extdata"),
"sample_data.tsv")
# Create Rsearch object
obj <- rsearch_obj(readcount_data = readcount.dta,
sequence_data = sequence.dta,
sample_data = sample.dta,
sample_id_col = "sample_id")
# Convert Rsearch object to phyloseq object
phy_obj <- rsearch2phyloseq(obj, sample_id_col = "sample_id")
# Convert phyloseq object to Rsearch object
rsearch_obj <- phyloseq2rsearch(phy_obj)
} # }