Use cases

The surveytable package can be used with different types of data. Here are some examples.

Unweighted data

Unweighted data is stored in a data.frame or a similar object. One example of such a similar object is a tibble (tbl), which can be produced by the tibble package. data.frames and similar objects do not contain information about survey design variables. Thus, surveytable treats these objects as unweighted data, with each observation having a weight of 1.

The example below illustrates how to use surveytable with unweighted data. We

library(surveytable)
library(tibble)

mytbl = as_tibble(namcs2019sv_df)

set_survey(mytbl)
#> * mytbl: the survey is unweighted.
Survey info {mytbl (unweighted)}
Variables Observations Design
33 8,250 Independent Sampling design (with replacement) survey::svydesign(ids = ~1, probs = rep(1, nrow(design)), data = design)
tab("SPECCAT")
Type of specialty (Primary, Medical, Surgical) {mytbl (unweighted)}
Level n Number SE LL UL Percent SE LL UL
Primary care specialty 2,993 2,993 44 2,909 3,080 36.3 0.5 35.2 37.3
Surgical care specialty 3,050 3,050 44 2,965 3,137 37.0 0.5 35.9 38.0
Medical care specialty 2,207 2,207 40 2,130 2,287 26.8 0.5 25.8 27.7
N = 8250.

Complex survey

A complex survey is defined by its data as well as its survey design variables. In R, a complex survey is stored in a survey object. This object, in addition to containing the survey data, also contains information about the survey design variables. These include variables that specify such things as:

You can convert a data.frame or a similar object to a survey object using the survey::svydesign() command. Before using this command, you should consult the documentation for the survey that you are analyzing to find out what the survey design variables are.

The example below illustrates how to use surveytable with a complex survey. We

library(surveytable)

mysurvey = survey::svydesign(ids = ~ CPSUM
  , strata = ~ CSTRATM
  , weights = ~ PATWT
  , data = namcs2019sv_df)

set_survey(mysurvey)
Survey info {mysurvey}
Variables Observations Design
33 8,250 Stratified 1 - level Cluster Sampling design (with replacement) With (398) clusters. survey::svydesign(ids = ~CPSUM, strata = ~CSTRATM, weights = ~PATWT, data = namcs2019sv_df)
tab("SPECCAT")
Type of specialty (Primary, Medical, Surgical) {mysurvey}
Level n Number SE LL UL Percent SE LL UL
Primary care specialty 2,993 521,466,378 31,136,212 463,840,192 586,251,877 50.3 2.6 45.1 55.5
Surgical care specialty 3,050 214,831,829 31,110,335 161,661,415 285,489,984 20.7 3.0 15.1 27.3
Medical care specialty 2,207 300,186,150 43,496,739 225,806,019 399,066,973 29.0 3.6 22.1 36.6
N = 8250.

Spark-based complex survey

Especially if you are working with big data, that data might be stored in a database, such as Apache Spark. mysurvey can work with a survey whose data lives in a database.

The example below illustrates how to use surveytable with a Spark-based complex survey. We

Note that, for this example, we are using a "local" Spark connection – how you connect to Spark depends on your setup.

library(surveytable)
library(sparklyr)
#> Warning: package 'sparklyr' was built under R version 4.4.3
library(dplyr)

sc = spark_connect(master = "local")
#> * Using Spark: 3.5.5

mysparkdf = copy_to(sc, namcs2019sv_df)
mysurvey = survey::svydesign(ids = ~CPSUM, strata = ~CSTRATM
                             , weights = ~PATWT, data = mysparkdf)

set_survey(mysurvey)
Survey info {mysurvey}
Variables Observations Design
33 8,250 Stratified 1 - level Cluster Sampling design (with replacement) With (398) clusters. survey::svydesign(ids = ~CPSUM, strata = ~CSTRATM, weights = ~PATWT, data = mysparkdf)
tab("SPECCAT")
SPECCAT {mysurvey}
Level n Number SE LL UL Percent SE LL UL
Medical care specialty 2,207 300,186,150 43,496,739 225,806,019 399,066,973 29.0 3.6 22.1 36.6
Primary care specialty 2,993 521,466,378 31,136,212 463,840,192 586,251,877 50.3 2.6 45.1 55.5
Surgical care specialty 3,050 214,831,829 31,110,335 161,661,415 285,489,984 20.7 3.0 15.1 27.3
N = 8250.
spark_disconnect_all()
#> [1] 1

Complex survey with replicate weights

Some surveys, instead of specifying survey design variables, specify replicate weights. They might do this, for example, for privacy reasons.

You can convert a data.frame or a similar object to a survey object that uses replicate weights using the survey::svrepdesign() command.

The example below illustrates how to use surveytable with a complex survey that uses replicate weights. We

library(surveytable)

mydata = namcs2019sv_df
nr = nrow(mydata)
set.seed(42)
for (ii in 1:20) {
  mydata[,paste0("fake_repw", ii)] = runif(nr, 10, 1000)
}
mydata$fake_w = runif(nr, 10, 1000)

mysurvey = survey::svrepdesign(
  repweights = "fake_repw*"
  , weights = ~fake_w
  , data = mydata
)

set_survey(mysurvey)
Survey info {mysurvey}
Variables Observations Design
54 8,250 Call: svrepdesign.default(repweights = “fake_repw*“, weights = ~fake_w, data = mydata) Balanced Repeated Replicates with 20 replicates.
tab("SPECCAT")
Type of specialty (Primary, Medical, Surgical) {mysurvey}
Level n Number SE LL UL Percent SE LL UL
Primary care specialty 2,993 1,504,579 16,005 1,473,519 1,536,295 36.3 0.3 35.7 36.8
Surgical care specialty 3,050 1,520,930 13,701 1,494,299 1,548,036 36.7 0.3 36.0 37.3
Medical care specialty 2,207 1,123,957 10,713 1,103,140 1,145,166 27.1 0.2 26.6 27.6
N = 8250.