An introduction to baseverse

Overview

baseverse is intended to be a relatively minimal suite of packages, supporting the use of base R with native piping.

Several functions are wrapper functions for existing base-R functions, adding support for native piping:

p_cor(): a wrapper for cor()
p_glm(): a wrapper for glm()
p_lm(): a wrapper for lm()
p_t.test(): a wrapper for t.test()
p_table(): a wrapper for table()
p_wilcox.test(): a wrapper for wilcox.test()

Other functions are wrapper functions for existing base-R features:

bang(): is a wrapper for !, and is similar to not() from magrittr
bracket(): is a wrapper for []
dollar(): is a wrapper for $, and is similar to pull() from dplyr

Other functions mimic tidyverse functions:

base_match(): mimics case_match(), but returns a factor and respects the user’s desired order of groups
base_when(): mimics case_when(), but returns a factor and respects the user’s desired order of groups
et(): mimics count()

Loading the package

Load the package:

library(baseverse)

Load the data

This vignette will draw from the built-in nhanes data:

data(nhanes)

Country of birth

Table the dmdborn4 variable:

nhanes |> p_table(dmdborn4)

## 
##     1     2 
## 10039  1875

Create a new, labelled version of dmdborn4:

nhanes<-nhanes |> transform(
  country=base_match(dmdborn4,'USA'=1,'Other'=2)
)

Table the new variable using p_table():

nhanes |> p_table(country)

## 
##   USA Other 
## 10039  1875

Or, table the new variable using et():

nhanes |> et(country)

##   country     n
## 1     USA 10039
## 2   Other  1875
## 3    <NA>    19

Notice that the USA group is listed first. This is, deliberately, hugely different behavior from case_match().

Total cholesterol

Summarize the lbxtc variable:

nhanes$lbxtc |> summary()

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    62.0   151.0   178.0   181.5   207.0   438.0    5043

Or, using dollar():

nhanes |> dollar(lbxtc) |> summary()

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    62.0   151.0   178.0   181.5   207.0   438.0    5043

Create a categorical variable for total cholesterol:

nhanes<-nhanes |>
  transform(
    cholesterol=base_when(
      'Desirable' = (lbxtc<200),
      'Borderline high' = (lbxtc>=200)&(lbxtc<240),
      'High' = (lbxtc>=240)
    )
  )

Table the new variable using p_table():

nhanes |> p_table(cholesterol)

## 
##       Desirable Borderline high            High 
##            4797            1460             633

Or, table the new variable using et():

nhanes |> et(cholesterol)

##       cholesterol    n
## 1       Desirable 4797
## 2 Borderline high 1460
## 3            High  633
## 4            <NA> 5043

Notice that the Desirable group is listed first. This is, deliberately, hugely different behavior from case_when().

Linear regression

Fit a linear model for systolic blood pressure (bpxosy1):

model_1<-nhanes |> 
  p_lm(bpxosy1~ridageyr+country+lbxtc)

Summarize the model:

model_1 |>
  summary()

## 
## Call:
## stats::lm(formula = formula, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.672 -10.213  -1.227   8.520 107.359 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  97.440904   0.907807 107.337  < 2e-16 ***
## ridageyr      0.401313   0.009199  43.626  < 2e-16 ***
## countryOther -0.095695   0.509981  -0.188    0.851    
## lbxtc         0.020008   0.004775   4.190 2.82e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16 on 6553 degrees of freedom
##   (5376 observations deleted due to missingness)
## Multiple R-squared:  0.2417, Adjusted R-squared:  0.2414 
## F-statistic: 696.4 on 3 and 6553 DF,  p-value: < 2.2e-16

Obtain 95% confidence intervals for the coefficients:

model_1 |>
  confint()

##                    2.5 %      97.5 %
## (Intercept)  95.66130724 99.22050174
## ridageyr      0.38328041  0.41934603
## countryOther -1.09542428  0.90403489
## lbxtc         0.01064781  0.02936871