Title: Basic Sequence Processing Tool for Biological Data
Version: 0.2.0
Description: Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology.
License: GPL-3
URL: https://github.com/ambuvjyn/baseq
BugReports: https://github.com/ambuvjyn/baseq/issues
Encoding: UTF-8
RoxygenNote: 7.3.3
Imports: ggplot2
Suggests: testthat (≥ 3.0.0), rmarkdown, knitr, Biostrings
VignetteBuilder: knitr
Config/testthat/edition: 3
LazyData: true
NeedsCompilation: no
Author: Ambu Vijayan ORCID iD [aut, cre], J. Sreekumar ORCID iD [aut] (Principal Scientist, ICAR - Central Tuber Crops Research Institute)
Maintainer: Ambu Vijayan <ambuvjyn@gmail.com>
Packaged: 2026-03-11 22:11:40 UTC; ambuv
Depends: R (≥ 3.5.0)
Repository: CRAN
Date/Publication: 2026-03-11 22:30:18 UTC

Bioconductor Bridge

Description

Converts baseq sequences to Biostrings format.

Usage

as_Biostrings(s)

Arguments

s

A character vector or list of sequences

Value

A DNAStringSet object


S3 DNA Class

Description

Creates an S3 object of class baseq_dna.

Usage

as_baseq_dna(s)

Arguments

s

A character string containing the sequence

Value

A baseq_dna object


S3 RNA Class

Description

Creates an S3 object of class baseq_rna.

Usage

as_baseq_rna(s)

Arguments

s

A character string containing the sequence

Value

A baseq_rna object


Assembly Stats

Description

Computes N50, L50, and other assembly statistics.

Usage

calculate_assembly_stats(seqs)

Arguments

seqs

A character vector or list of sequences (contigs)

Value

A named numeric vector of statistics

Examples

contigs <- c("ATGC", "ATGCATGC", "ATGCATGCATGC")
calculate_assembly_stats(contigs)

Protein Net Charge

Description

Calculates the net electrical charge of a protein at a given pH.

Usage

calculate_charge(s, ph = 7.4)

Arguments

s

A character string containing the protein sequence

ph

Numeric pH value (default: 7.4)

Value

Numeric net charge


Codon Usage RSCU

Description

Calculates Relative Synonymous Codon Usage (RSCU).

Usage

calculate_codon_usage(s)

Arguments

s

A character string containing the coding DNA sequence

Value

A dataframe with codon statistics

Examples

data(sars_fragment)
calculate_codon_usage(sars_fragment)

Sequence Identity

Description

Compares two sequences of equal length.

Usage

calculate_identity(s1, s2)

Arguments

s1

First sequence

s2

Second sequence

Value

A list with Identity percentage and Hamming Distance

Examples

calculate_identity("ATGC", "ATGG")

Protein MW

Description

Calculates the molecular weight of a protein sequence.

Usage

calculate_mw(s)

Arguments

s

A character string containing the protein sequence

Value

Numeric molecular weight in Daltons


Protein pI

Description

Estimates the isoelectric point of a protein sequence.

Usage

calculate_pi(s)

Arguments

s

A character string containing the protein sequence

Value

Numeric pI value


Primer Tm

Description

Calculates the melting temperature of a primer sequence.

Usage

calculate_tm(s, salt = 50)

Arguments

s

A character string containing the sequence

salt

Numeric salt concentration in mM (default: 50)

Value

Numeric Tm in Celsius


Batch File Cleaner

Description

Cleans all sequences in a FASTA or FASTQ file.

Usage

clean_file(input_file, type = "auto", output_dir = "")

Arguments

input_file

Path to input file

type

Sequence type ("DNA", "RNA", or "auto")

output_dir

Optional output directory

Value

Path to the cleaned file


Universal Sequence Cleaner

Description

Removes non-standard characters from DNA or RNA sequences.

Usage

clean_seq(sequence, type = "auto")

Arguments

sequence

A character string containing the sequence

type

A string "DNA", "RNA", or "auto"

Value

A character string of the cleaned sequence


Count Bases

Description

Returns a frequency table of the bases in a sequence.

Usage

count_bases(s)

Arguments

s

A character string containing the sequence

Value

A table object with base counts

Examples

data(sars_fragment)
count_bases(sars_fragment)

K-mer Counting

Description

Counts all possible substrings of length k.

Usage

count_kmers(s, k = 3)

Arguments

s

A character string containing the sequence

k

Integer length of k-mer

Value

A table of k-mer counts

Examples

data(sars_fragment)
count_kmers(sars_fragment, k = 3)

Count Pattern

Description

Counts the occurrences of a specific pattern in a sequence.

Usage

count_pattern(s, p)

Arguments

s

A character string containing the sequence

p

A character string containing the pattern to count

Value

Integer count of occurrences

Examples

data(sars_fragment)
count_pattern(sars_fragment, "ATTA")

Translate DNA to Protein

Description

Translates a DNA sequence into protein in all 6 reading frames.

Usage

dna_to_protein(s, table = 1)

Arguments

s

A character string containing the DNA sequence

table

Integer indicating the NCBI genetic code table (default: 1)

Value

A list of translated protein sequences


DNA to RNA

Description

Transcribes a DNA sequence into RNA.

Usage

dna_to_rna(s)

Arguments

s

A character string containing the DNA sequence

Value

A character string of the RNA sequence


Convert FASTQ to FASTA

Description

Converts a FASTQ file to FASTA format.

Usage

fastq_to_fasta(fastq_file)

Arguments

fastq_file

Path to input FASTQ

Value

Path to output FASTA


Quality Filter FASTQ

Description

Filters FASTQ reads based on average quality score.

Usage

filter_fastq_quality(
  input_file,
  output_file,
  min_avg_quality = 20,
  phred_offset = 33
)

Arguments

input_file

Path to input FASTQ

output_file

Path to output FASTQ

min_avg_quality

Minimum average Phred score (default: 20)

phred_offset

Phred offset (default: 33)


CpG Island Detection

Description

Identifies candidate CpG islands in a DNA sequence.

Usage

find_cpg_islands(s, window = 200)

Arguments

s

A character string containing the DNA sequence

window

Sliding window size (default: 200)

Value

A dataframe with start and end positions


Find Longest ORF

Description

Scans a DNA sequence in all 6 reading frames to find the longest open reading frame.

Usage

find_longest_orf(s)

Arguments

s

A character string containing the DNA sequence

Value

A character string of the longest translated protein sequence


GC Content

Description

Calculates the percentage of G and C bases in a DNA sequence.

Usage

gc_content(s)

Arguments

s

A character string containing the sequence

Value

Numeric percentage of GC content

Examples

data(sars_fragment)
gc_content(sars_fragment)

Get Genetic Code

Description

Returns a mapping of codons to amino acids.

Usage

get_genetic_code(table = 1)

Arguments

table

Integer NCBI genetic code table index

Value

A named character vector


Plot AA Composition

Description

Visualizes the amino acid composition categorized by biochemical properties.

Usage

plot_aa_composition(s)

Arguments

s

A character string containing the protein sequence

Value

A ggplot object

Examples

prot <- "MKFLVLALAL"
plot_aa_composition(prot)

Plot Dot Plot

Description

Generates a dot plot comparison of two sequences.

Usage

plot_dotplot(s1, s2, window = 1)

Arguments

s1

First sequence

s2

Second sequence

window

Integer word size for matching (default: 1)

Value

A ggplot object

Examples

s1 <- "ATGCATGCATGC"
s2 <- "ATGCGTGCATGC"
plot_dotplot(s1, s2, window = 3)

Plot GC Skew

Description

Generates a sliding window plot of GC skew (G-C)/(G+C).

Usage

plot_gc_skew(s, window = 100)

Arguments

s

A character string containing the DNA sequence

window

Integer window size (default: 100)

Value

A ggplot object

Examples

data(sars_fragment)
plot_gc_skew(sars_fragment, window = 100)

Plot Hydrophobicity

Description

Generates a sliding window plot of protein hydrophobicity using the Kyte-Doolittle scale.

Usage

plot_hydrophobicity(s, window = 9)

Arguments

s

A character string containing the protein sequence

window

Integer window size (default: 9)

Value

A ggplot object

Examples

prot <- "MKFLVLALAL"
plot_hydrophobicity(prot, window = 3)

Universal Sequence Reader

Description

Reads a FASTA or FASTQ file and returns it as a dataframe or list.

Usage

read_seq(file, format = "df")

Arguments

file

Path to the input sequence file

format

A string indicating "df" (dataframe) or "list" (default: "df")

Value

A dataframe or list of the sequence data.


Universal Reverse Complement

Description

Generates the reverse complement of a DNA or RNA sequence.

Usage

rev_comp(sequence)

Arguments

sequence

A character string containing the sequence

Value

A character string of the reverse complement


Reverse Translation

Description

Converts a protein sequence back into DNA using common codons.

Usage

reverse_translate(s)

Arguments

s

A character string containing the protein sequence

Value

A character string of the resulting DNA sequence


RNA to DNA

Description

Reverse transcribes an RNA sequence into DNA.

Usage

rna_to_dna(s)

Arguments

s

A character string containing the RNA sequence

Value

A character string of the DNA sequence


Translate RNA to Protein

Description

Translates an RNA sequence into protein in all 6 reading frames.

Usage

rna_to_protein(s, table = 1)

Arguments

s

A character string containing the RNA sequence

table

Integer indicating the NCBI genetic code table (default: 1)

Value

A list of translated protein sequences


SARS-CoV-2 Genome Fragment

Description

A small fragment of the SARS-CoV-2 genome used for examples and testing.

Usage

sars_fragment

Format

A character string.

Source

NCBI GenBank


Motif Searching

Description

Finds all occurrences of a motif in a sequence.

Usage

search_motif(s, p)

Arguments

s

A character string containing the sequence

p

A character string containing the motif (regex)

Value

A dataframe with the Start, End, and Match string


Shuffle Sequence

Description

Randomly permutes the characters of a sequence.

Usage

shuffle_sequence(s)

Arguments

s

A character string containing the sequence

Value

A character string of the shuffled sequence


Virtual Digestion

Description

Simulates restriction enzyme digestion.

Usage

simulate_digestion(s, p)

Arguments

s

A character string containing the DNA sequence

p

A character string containing the restriction site (regex)

Value

A numeric vector of fragment lengths


Simulate FASTA File

Description

Generates a dummy FASTA dataset.

Usage

simulate_fasta(n_seq = 5, seq_len = 100, gc = NULL, type = "DNA", file = NULL)

Arguments

n_seq

Number of sequences

seq_len

Length of each sequence

gc

Target GC content

type

"DNA" or "RNA"

file

Optional file path to save

Value

A dataframe of simulated sequences


Simulate FASTQ File

Description

Generates a dummy FASTQ dataset.

Usage

simulate_fastq(
  n_reads = 5,
  read_len = 100,
  gc = NULL,
  type = "DNA",
  file = NULL
)

Arguments

n_reads

Number of reads

read_len

Length of each read

gc

Target GC content

type

"DNA" or "RNA"

file

Optional file path to save

Value

A dataframe of simulated reads


PCR Simulator

Description

Simulates a PCR reaction and predicts amplicon sizes.

Usage

simulate_pcr(template, fwd, rev_p)

Arguments

template

A character string containing the DNA template

fwd

A character string of the forward primer

rev_p

A character string of the reverse primer

Value

A numeric vector of amplicon sizes


Simulate Sequence

Description

Generates a random DNA or RNA sequence.

Usage

simulate_sequence(len, gc = NULL, type = "DNA")

Arguments

len

Integer length of the sequence

gc

Numeric target GC content (0 to 1)

type

"DNA" or "RNA"

Value

A character string of the simulated sequence


FASTA Summary

Description

Generates a comprehensive summary of a multi-FASTA file.

Usage

summarize_fasta(file)

Arguments

file

Path to the FASTA file

Value

A summary dataframe

Examples


# summarize_fasta("path/to/my.fasta")


Generic Translate

Description

Generic function to translate DNA or RNA to protein.

Usage

translate(x, ...)

Arguments

x

A baseq_dna or baseq_rna object

...

Additional arguments

Value

A list of translated sequences


Universal Sequence Writer

Description

Writes a sequence object (dataframe or list) to a FASTA or FASTQ file.

Usage

write_seq(x, file)

Arguments

x

A sequence object (dataframe or list)

file

Path to the output sequence file

Value

Invisible TRUE