Help for package baseq

Title:

Basic Sequence Processing Tool for Biological Data

Version:

0.2.0

Description:

Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.

License:

GPL-3

URL:

https://github.com/ambuvjyn/baseq

BugReports:

https://github.com/ambuvjyn/baseq/issues

Encoding:

UTF-8

RoxygenNote:

7.3.3

Imports:

ggplot2

Suggests:

testthat (≥ 3.0.0), rmarkdown, knitr, Biostrings

VignetteBuilder:

knitr

Config/testthat/edition:

LazyData:

true

NeedsCompilation:

Author:

Ambu Vijayan

[aut, cre], J. Sreekumar

[aut] (Principal Scientist, ICAR - Central Tuber Crops Research Institute)

Maintainer:

Ambu Vijayan <ambuvjyn@gmail.com>

Packaged:

2026-03-11 22:11:40 UTC; ambuv

Depends:

R (≥ 3.5.0)

Repository:

CRAN

Date/Publication:

2026-03-11 22:30:18 UTC

Bioconductor Bridge

Description

Converts baseq sequences to Biostrings format.

Usage

as_Biostrings(s)

Arguments

s

A character vector or list of sequences

Value

A DNAStringSet object

S3 DNA Class

Description

Creates an S3 object of class baseq_dna.

Usage

as_baseq_dna(s)

Arguments

s

A character string containing the sequence

Value

A baseq_dna object

S3 RNA Class

Description

Creates an S3 object of class baseq_rna.

Usage

as_baseq_rna(s)

Arguments

s

A character string containing the sequence

Value

A baseq_rna object

Assembly Stats

Description

Computes N50, L50, and other assembly statistics.

Usage

calculate_assembly_stats(seqs)

Arguments

seqs

A character vector or list of sequences (contigs)

Value

A named numeric vector of statistics

Examples

contigs <- c("ATGC", "ATGCATGC", "ATGCATGCATGC")
calculate_assembly_stats(contigs)

Protein Net Charge

Description

Calculates the net electrical charge of a protein at a given pH.

Usage

calculate_charge(s, ph = 7.4)

Arguments

s

A character string containing the protein sequence

ph

Numeric pH value (default: 7.4)

Value

Numeric net charge

Codon Usage RSCU

Description

Calculates Relative Synonymous Codon Usage (RSCU).

Usage

calculate_codon_usage(s)

Arguments

s

A character string containing the coding DNA sequence

Value

A dataframe with codon statistics

Examples

data(sars_fragment)
calculate_codon_usage(sars_fragment)

Sequence Identity

Description

Compares two sequences of equal length.

Usage

calculate_identity(s1, s2)

Arguments

s1

First sequence

s2

Second sequence

Value

A list with Identity percentage and Hamming Distance

Examples

calculate_identity("ATGC", "ATGG")

Protein MW

Description

Calculates the molecular weight of a protein sequence.

Usage

calculate_mw(s)

Arguments

s

A character string containing the protein sequence

Value

Numeric molecular weight in Daltons

Protein pI

Description

Estimates the isoelectric point of a protein sequence.

Usage

calculate_pi(s)

Arguments

s

A character string containing the protein sequence

Value

Numeric pI value

Primer Tm

Description

Calculates the melting temperature of a primer sequence.

Usage

calculate_tm(s, salt = 50)

Arguments

s

A character string containing the sequence

salt

Numeric salt concentration in mM (default: 50)

Value

Numeric Tm in Celsius

Batch File Cleaner

Description

Cleans all sequences in a FASTA or FASTQ file.

Usage

clean_file(input_file, type = "auto", output_dir = "")

Arguments

input_file

Path to input file

type

Sequence type ("DNA", "RNA", or "auto")

output_dir

Optional output directory

Value

Path to the cleaned file

Universal Sequence Cleaner

Description

Removes non-standard characters from DNA or RNA sequences.

Usage

clean_seq(sequence, type = "auto")

Arguments

sequence

A character string containing the sequence

type

A string "DNA", "RNA", or "auto"

Value

A character string of the cleaned sequence

Count Bases

Description

Returns a frequency table of the bases in a sequence.

Usage

count_bases(s)

Arguments

s

A character string containing the sequence

Value

A table object with base counts

Examples

data(sars_fragment)
count_bases(sars_fragment)

K-mer Counting

Description

Counts all possible substrings of length k.

Usage

count_kmers(s, k = 3)

Arguments

s

A character string containing the sequence

k

Integer length of k-mer

Value

A table of k-mer counts

Examples

data(sars_fragment)
count_kmers(sars_fragment, k = 3)

Count Pattern

Description

Counts the occurrences of a specific pattern in a sequence.

Usage

count_pattern(s, p)

Arguments

s

A character string containing the sequence

p

A character string containing the pattern to count

Value

Integer count of occurrences

Examples

data(sars_fragment)
count_pattern(sars_fragment, "ATTA")

Translate DNA to Protein

Description

Translates a DNA sequence into protein in all 6 reading frames.

Usage

dna_to_protein(s, table = 1)

Arguments

s

A character string containing the DNA sequence

table

Integer indicating the NCBI genetic code table (default: 1)

Value

A list of translated protein sequences

DNA to RNA

Description

Transcribes a DNA sequence into RNA.

Usage

dna_to_rna(s)

Arguments

s

A character string containing the DNA sequence

Value

A character string of the RNA sequence

Convert FASTQ to FASTA

Description

Converts a FASTQ file to FASTA format.

Usage

fastq_to_fasta(fastq_file)

Arguments

fastq_file

Path to input FASTQ

Value

Path to output FASTA

Quality Filter FASTQ

Description

Filters FASTQ reads based on average quality score.

Usage

filter_fastq_quality(
  input_file,
  output_file,
  min_avg_quality = 20,
  phred_offset = 33
)

Arguments

input_file

Path to input FASTQ

output_file

Path to output FASTQ

min_avg_quality

Minimum average Phred score (default: 20)

phred_offset

Phred offset (default: 33)

CpG Island Detection

Description

Identifies candidate CpG islands in a DNA sequence.

Usage

find_cpg_islands(s, window = 200)

Arguments

s

A character string containing the DNA sequence

window

Sliding window size (default: 200)

Value

A dataframe with start and end positions

Find Longest ORF

Description

Scans a DNA sequence in all 6 reading frames to find the longest open reading frame.

Usage

find_longest_orf(s)

Arguments

s

A character string containing the DNA sequence

Value

A character string of the longest translated protein sequence

GC Content

Description

Calculates the percentage of G and C bases in a DNA sequence.

Usage

gc_content(s)

Arguments

s

A character string containing the sequence

Value

Numeric percentage of GC content

Examples

data(sars_fragment)
gc_content(sars_fragment)

Get Genetic Code

Description

Returns a mapping of codons to amino acids.

Usage

get_genetic_code(table = 1)

Arguments

table

Integer NCBI genetic code table index

Value

A named character vector

Plot AA Composition

Description

Visualizes the amino acid composition categorized by biochemical properties.

Usage

plot_aa_composition(s)

Arguments

s

A character string containing the protein sequence

Value

A ggplot object

Examples

prot <- "MKFLVLALAL"
plot_aa_composition(prot)

Plot Dot Plot

Description

Generates a dot plot comparison of two sequences.

Usage

plot_dotplot(s1, s2, window = 1)

Arguments

s1

First sequence

s2

Second sequence

window

Integer word size for matching (default: 1)

Value

A ggplot object

Examples

s1 <- "ATGCATGCATGC"
s2 <- "ATGCGTGCATGC"
plot_dotplot(s1, s2, window = 3)

Plot GC Skew

Description

Generates a sliding window plot of GC skew (G-C)/(G+C).

Usage

plot_gc_skew(s, window = 100)

Arguments

s

A character string containing the DNA sequence

window

Integer window size (default: 100)

Value

A ggplot object

Examples

data(sars_fragment)
plot_gc_skew(sars_fragment, window = 100)

Plot Hydrophobicity

Description

Generates a sliding window plot of protein hydrophobicity using the Kyte-Doolittle scale.

Usage

plot_hydrophobicity(s, window = 9)

Arguments

s

A character string containing the protein sequence

window

Integer window size (default: 9)

Value

A ggplot object

Examples

prot <- "MKFLVLALAL"
plot_hydrophobicity(prot, window = 3)

Universal Sequence Reader

Description

Reads a FASTA or FASTQ file and returns it as a dataframe or list.

Usage

read_seq(file, format = "df")

Arguments

file

Path to the input sequence file

format

A string indicating "df" (dataframe) or "list" (default: "df")

Value

A dataframe or list of the sequence data.

Universal Reverse Complement

Description

Generates the reverse complement of a DNA or RNA sequence.

Usage

rev_comp(sequence)

Arguments

sequence

A character string containing the sequence

Value

A character string of the reverse complement

Reverse Translation

Description

Converts a protein sequence back into DNA using common codons.

Usage

reverse_translate(s)

Arguments

s

A character string containing the protein sequence

Value

A character string of the resulting DNA sequence

RNA to DNA

Description

Reverse transcribes an RNA sequence into DNA.

Usage

rna_to_dna(s)

Arguments

s

A character string containing the RNA sequence

Value

A character string of the DNA sequence

Translate RNA to Protein

Description

Translates an RNA sequence into protein in all 6 reading frames.

Usage

rna_to_protein(s, table = 1)

Arguments

s

A character string containing the RNA sequence

table

Integer indicating the NCBI genetic code table (default: 1)

Value

A list of translated protein sequences

SARS-CoV-2 Genome Fragment

Description

A small fragment of the SARS-CoV-2 genome used for examples and testing.

Usage

sars_fragment

Format

A character string.

Source

NCBI GenBank

Motif Searching

Description

Finds all occurrences of a motif in a sequence.

Usage

search_motif(s, p)

Arguments

s

A character string containing the sequence

p

A character string containing the motif (regex)

Value

A dataframe with the Start, End, and Match string

Shuffle Sequence

Description

Randomly permutes the characters of a sequence.

Usage

shuffle_sequence(s)

Arguments

s

A character string containing the sequence

Value

A character string of the shuffled sequence

Virtual Digestion

Description

Simulates restriction enzyme digestion.

Usage

simulate_digestion(s, p)

Arguments

s

A character string containing the DNA sequence

p

A character string containing the restriction site (regex)

Value

A numeric vector of fragment lengths

Simulate FASTA File

Description

Generates a dummy FASTA dataset.

Usage

simulate_fasta(n_seq = 5, seq_len = 100, gc = NULL, type = "DNA", file = NULL)

Arguments

n_seq

Number of sequences

seq_len

Length of each sequence

gc

Target GC content

type

"DNA" or "RNA"

file

Optional file path to save

Value

A dataframe of simulated sequences

Simulate FASTQ File

Description

Generates a dummy FASTQ dataset.

Usage

simulate_fastq(
  n_reads = 5,
  read_len = 100,
  gc = NULL,
  type = "DNA",
  file = NULL
)

Arguments

n_reads

Number of reads

read_len

Length of each read

gc

Target GC content

type

"DNA" or "RNA"

file

Optional file path to save

Value

A dataframe of simulated reads

PCR Simulator

Description

Simulates a PCR reaction and predicts amplicon sizes.

Usage

simulate_pcr(template, fwd, rev_p)

Arguments

template

A character string containing the DNA template

fwd

A character string of the forward primer

rev_p

A character string of the reverse primer

Value

A numeric vector of amplicon sizes

Simulate Sequence

Description

Generates a random DNA or RNA sequence.

Usage

simulate_sequence(len, gc = NULL, type = "DNA")

Arguments

len

Integer length of the sequence

gc

Numeric target GC content (0 to 1)

type

"DNA" or "RNA"

Value

A character string of the simulated sequence

FASTA Summary

Description

Generates a comprehensive summary of a multi-FASTA file.

Usage

summarize_fasta(file)

Arguments

file

Path to the FASTA file

Value

A summary dataframe

Examples


# summarize_fasta("path/to/my.fasta")

Generic Translate

Description

Generic function to translate DNA or RNA to protein.

Usage

translate(x, ...)

Arguments

x

A baseq_dna or baseq_rna object

...

Additional arguments

Value

A list of translated sequences

Universal Sequence Writer

Description

Writes a sequence object (dataframe or list) to a FASTA or FASTQ file.

Usage

write_seq(x, file)

Arguments

x

A sequence object (dataframe or list)

file

Path to the output sequence file

Value

Invisible TRUE

Package {baseq}

Bioconductor Bridge

Description

Usage

Arguments

Value

S3 DNA Class

Description

Usage

Arguments

Value

S3 RNA Class

Description

Usage

Arguments

Value

Assembly Stats

Description

Usage

Arguments

Value

Examples

Protein Net Charge

Description

Usage

Arguments

Value

Codon Usage RSCU

Description

Usage

Arguments

Value

Examples

Sequence Identity

Description

Usage

Arguments

Value

Examples

Protein MW

Description

Usage

Arguments

Value

Protein pI

Description

Usage

Arguments

Value

Primer Tm

Description

Usage

Arguments

Value

Batch File Cleaner

Description

Usage

Arguments

Value

Universal Sequence Cleaner

Description

Usage

Arguments

Value

Count Bases

Description

Usage

Arguments

Value

Examples

K-mer Counting

Description

Usage

Arguments

Value

Examples

Count Pattern

Description

Usage

Arguments