Generates a plot representing the distribution of size values from a FASTA or FASTQ file/object.
Usage
plot_size_dist(
fastx_input,
input_format = NULL,
cutoff = NULL,
y_breaks = NULL,
plot_title = "Size distribution",
log_scale_y = TRUE,
n_bins = 30
)
Arguments
- fastx_input
(Required). A FASTA/FASTQ file path or FASTA/FASTQ object containing reads with size values embedded in the
Header
column. See Details.- input_format
(Optional). The format of the input file. Must be
"fasta"
or"fastq"
iffastx_input
is a file path. Defaults toNULL
.- cutoff
(Optional). A numeric value specifying a size threshold. Reads with size greater than this value will be grouped into a single category labeled
"> cutoff"
in the plot. Defaults toNULL
(no cutoff applied).- y_breaks
(Optional). A numeric vector specifying the breakpoints for the y-axis if log10 scaling is applied (
log_scale_y = TRUE
. Defaults toNULL
.- plot_title
(Optional). The title of the plot. Defaults to
"Size distribution"
. Set to""
for no title.- log_scale_y
(Optional). If
TRUE
(default), applies a log10 scale to the y-axis. IfFALSE
, the y-axis remains linear.- n_bins
(Optional). Number of bins used in the histogram if
cutoff
is unspecified. Defaults to30
, which is the default value inggplot2::geom_histogram()
.
Details
fastx_input
can either be a file path to FASTA/FASTQ file or a
FASTA/FASTQ object. FASTA objects are tibbles that contain the
columns Header
and Sequence
, see readFasta
. FASTQ
objects are tibbles that contain the columns Header
, Sequence
,
and Quality
, see readFastq
.
The Header
column must contain the size values for each read.
The Header
column must contain size annotations formatted as
;size=<int>
.
The y-axis of the plot can be log10-transformed to handle variations in read
counts across different size values. If y_breaks
is specified, the
given breakpoints will be used. If y_breaks
is NULL
,
ggplot2
will automatically determine suitable breaks.
Examples
if (FALSE) { # \dontrun{
# Define input file
fastx_input <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small_derep_R1.fa")
# Generate and display plot without cutoff
size_plot <- plot_size_dist(fastx_input = fastx_input,
input_format = "fasta")
print(size_plot)
# Generate and display plot with a cutoff at size 100
size_plot <- plot_size_dist(fastx_input = fastx_input,
input_format = "fasta",
cutoff = 100)
print(size_plot)
# Generate and display plot with custom y-axis breaks
size_plot <- plot_size_dist(fastx_input = fastx_input,
input_format = "fasta",
y_breaks = c(1, 50, 500, 5000))
print(size_plot)
# Generate and display plot with linear y-axis
size_plot <- plot_size_dist(fastx_input = fastx_input,
input_format = "fasta",
log_scale_y = FALSE)
print(size_plot)
} # }