The surveytable
package can be used with different types
of data. Here are some examples.
Unweighted data is stored in a data.frame
or a similar
object. One example of such a similar object is a tibble
(tbl
), which can be produced by the tibble
package. data.frame
s and similar objects do not contain
information about survey design variables. Thus,
surveytable
treats these objects as unweighted data, with
each observation having a weight of 1
.
The example below illustrates how to use surveytable
with unweighted data. We
surveytable
to work with this object; andSPECCAT
(physician specialty) variable
from these data.library(surveytable)
library(tibble)
mytbl = as_tibble(namcs2019sv_df)
set_survey(mytbl)
#> * mytbl: the survey is unweighted.
Survey info {mytbl (unweighted)} | ||
Variables | Observations | Design |
---|---|---|
Type of specialty (Primary, Medical, Surgical) {mytbl (unweighted)} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |
A complex survey is defined by its data as well as its survey design variables. In R, a complex survey is stored in a survey object. This object, in addition to containing the survey data, also contains information about the survey design variables. These include variables that specify such things as:
You can convert a data.frame
or a similar object to a
survey object using the survey::svydesign()
command. Before
using this command, you should consult the documentation for the survey
that you are analyzing to find out what the survey design variables
are.
The example below illustrates how to use surveytable
with a complex survey. We
surveytable
to work with this object; andSPECCAT
variable from the survey.library(surveytable)
mysurvey = survey::svydesign(ids = ~ CPSUM
, strata = ~ CSTRATM
, weights = ~ PATWT
, data = namcs2019sv_df)
set_survey(mysurvey)
Survey info {mysurvey} | ||
Variables | Observations | Design |
---|---|---|
Type of specialty (Primary, Medical, Surgical) {mysurvey} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |
Especially if you are working with big data, that data might be
stored in a database, such as Apache Spark. mysurvey
can
work with a survey whose data lives in a database.
The example below illustrates how to use surveytable
with a Spark-based complex survey. We
surveytable
to work with this object;SPECCAT
variable from the survey; and
finallyNote that, for this example, we are using a "local"
Spark connection – how you connect to Spark depends on your setup.
library(surveytable)
library(sparklyr)
#> Warning: package 'sparklyr' was built under R version 4.4.3
mysparkdf = copy_to(sc, namcs2019sv_df)
mysurvey = survey::svydesign(ids = ~CPSUM, strata = ~CSTRATM
, weights = ~PATWT, data = mysparkdf)
set_survey(mysurvey)
Survey info {mysurvey} | ||
Variables | Observations | Design |
---|---|---|
SPECCAT {mysurvey} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |
Some surveys, instead of specifying survey design variables, specify replicate weights. They might do this, for example, for privacy reasons.
You can convert a data.frame
or a similar object to a
survey object that uses replicate weights using the
survey::svrepdesign()
command.
The example below illustrates how to use surveytable
with a complex survey that uses replicate weights. We
surveytable
to work with this object; andSPECCAT
variable from the survey.library(surveytable)
mydata = namcs2019sv_df
nr = nrow(mydata)
set.seed(42)
for (ii in 1:20) {
mydata[,paste0("fake_repw", ii)] = runif(nr, 10, 1000)
}
mydata$fake_w = runif(nr, 10, 1000)
mysurvey = survey::svrepdesign(
repweights = "fake_repw*"
, weights = ~fake_w
, data = mydata
)
set_survey(mysurvey)
Survey info {mysurvey} | ||
Variables | Observations | Design |
---|---|---|
Type of specialty (Primary, Medical, Surgical) {mysurvey} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |