Introduction to baseq

library(baseq)

Introduction

baseq is a basic sequence processing tool for biological data. It provides simple and efficient functions for common tasks in molecular biology, such as cleaning sequences, translating DNA/RNA to protein, and calculating GC content.

Sequence Cleaning

You can clean DNA or RNA sequences by removing any non-standard characters. The universal clean_seq() function automatically detects the type.

dna_seq <- "ATGCnNryMK"
clean_seq(dna_seq)
#> [1] "ATGC"

rna_seq <- "AUGGCuuNnRYMK"
clean_seq(rna_seq)
#> [1] "AUGGCUU"

Translation

baseq can translate DNA and RNA sequences into protein sequences in all six reading frames.

dna_seq <- "ATCGAGCTAGCTAGCTAGCTAGCT"
proteins <- dna_to_protein(dna_seq)
proteins[["Frame F1"]]
#> [1] "IELAS"

GC Content

Calculate the GC content of a DNA sequence.

dna_seq <- "ATGCATGC"
gc_content(dna_seq)
#> [1] 50

Reading and Writing Files

baseq provides universal functions to read and write FASTA and FASTQ files.

# Read a FASTA file into a dataframe
# df <- read_seq("path/to/file.fasta")

# Write a dataframe to a FASTA file
# write_seq(df, "output.fasta")

For more details, see the documentation for individual functions.