Title: | Multilevel Mantel-Haenszel Statistics for Differential Item Functioning Detection |
Version: | 1.1 |
Author: | Shenghai Dai [aut, cre], Brian F. French [aut], W. Holmes Finch [aut], Andrew Iverson [aut] |
Maintainer: | Shenghai Dai <s.dai@wsu.edu> |
Description: | Clustered or multilevel data structures are common in the assessment of differential item functioning (DIF), particularly in the context of large-scale assessment programs. This package allows users to implement extensions of the Mantel-Haenszel DIF detection procedures in the presence of multilevel data based on the work of Begg (1999) <doi:10.1111/j.0006-341X.1999.00302.x>, Begg & Paykin (2001) <doi:10.1080/00949650108812115>, and French & Finch (2013) <doi:10.1177/0013164412472341>. |
Depends: | R (≥ 3.5.0), stats |
Imports: | TestDataImputation, plyr |
NeedsCompilation: | no |
Encoding: | UTF-8 |
LazyData: | true |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
RoxygenNote: | 7.0.2 |
Packaged: | 2020-03-19 13:51:44 UTC; daish |
Repository: | CRAN |
Date/Publication: | 2020-03-20 17:10:06 UTC |
Function to create contigency tables
Description
This function creates contigency tables by strata for each item. Both dichotomous and polytomous item responses are allowed. It also handles missing responses and returns a cleaned data set with no missing data.
Usage
ContigencyTables (Response.data, Response.code=c(0,1),
Group, group.names=NULL, Stratum=NULL, Cluster=NULL,
missing.code="NA", missing.impute="LW", print.information=TRUE)
Arguments
Response.data |
A scored item responses matrix in the form of matrix or data frame. This matrix should not include any other variables (group, stratum, cluser, etc.). |
Response.code |
A numerical vector of all possible item responses. By default, Response.code=c(0,1). |
Group |
The variable of group membership (e.g., gender). Its length should be equal to the sample size of the item response matrix. |
group.names |
Names for each defined group (e.g., c('Male','Female')). This argument is optional. By default, group.names=NULL. If not provided, group names of "Group.1, Group.2, etc." will be automatically generated. |
Stratum |
The matching variable. By default, Stratum=NULL. If not provided, the observed total score will be used. |
Cluster |
The cluster variable. Its length should be equal to the sample size of the item response matrix. By default, Cluster=NULL. This variable will not be used to generate contigency tables. It will be included in the returned data set for DIF analysis. |
missing.code |
Indication of how missing values were defined in the data. By default, missing.code="NA". |
missing.impute |
The approach selected to handle missing item responses. By default, missing.impute="LW", indicating the list-wise
deletion will be used. Other options include: "PM" (person mean or row mean imputation),"IM" (item mean or column mean imputation),
"TW" (two-way imputation), "LR" (logistic regression imputation), and EM (EM imputation). Check the package "TestDataImputation"
(https://cran.r-project.org/package=TestDataImputation) for more details. |
print.information |
Indicator of whether function running information is printed on screen. By default, print.information=TRUE. |
Details
This function creats contigency tables.
Value
A list of strata statistcs, contigency tables, etc.
Strata.stats |
Summary statistics for each item: n.valid.strata, n.valid.category, and also sample sizes for each stratum across items. |
c.table.list.all |
A list that contains all contigency tables across items and strata. |
c.table.list.valid |
A list that contains only valid contigency tables across items and strata. Strata that have missing item response categories or zero marginal means are removed. |
data.out |
A cleaned data set with variables "Group", "Group.factor","Cluster", "Stratum", and all item responses (with missing data handled). |
Examples
#Specify the item responses matrix
data(data.adult)
Response.data<-data.adult[,2:13]
#Run the function with specifications
c.table.out<-ContigencyTables(Response.data, Response.code=c(0,1),
Group=data.adult$Group, group.names=NULL,
Stratum=NULL, Cluster=NULL, missing.code="NA",
missing.impute= "LW",print.information = TRUE)
#Obtain results
c.tables.all<-c.table.out$c.table.list.all
c.tables.valid<-c.table.out$c.table.list.valid
c.table.out$Strata.stats
data.use<-c.table.out$data.out
Main function to compute adjusted Mantel-Haenszel statistics
Description
This main function computes both unadjusted and adjusted MH statistics in the presence of clustered data based on Begg (1999) <doi:10.1111/j.0006-341X.1999.00302.x>, Begg & Paykin (2001) <doi:10.1080/00949650108812115>, and French & Finch (2013) <doi: 10.1177/0013164412472341>.
Usage
ML.DIF (Response.data, Response.code=c(0,1),Cluster, Group,
group.names=NULL, Stratum=NULL, correct.factor=0.85,
missing.code="NA", missing.impute="LW",
anchor.items=NULL, purification=FALSE,
max.iter=10, alpha = .05)
Arguments
Response.data |
A scored item responses matrix in the form of matrix or data frame. This matrix should not include any other variables (group, stratum, cluser, etc.). |
Response.code |
A numerical vector of all possible item responses. By default, Response.code=c(0,1). |
Cluster |
The cluster variable. Its length should be equal to the sample size of the item response matrix. |
Group |
The variable of group membership (e.g., gender). Its length should be equal to the sample size of the item response matrix. |
group.names |
Names for each defined group (e.g., c('Male','Female')). This argument is optional. By default, group.names=NULL. If not provided, group names of "Group.1, Group.2, etc." will be automatically generated. |
Stratum |
The matching variable. By default, Stratum=NULL. If not provided, the observed total score will be used. |
correct.factor |
The value of adjustment applied to the adjusted MH statistic (i.e., f). The default value used here is .85. The adjusted MH statistic was found to exhibit low statistical power for DIF detection in some conditions. One solution to this is to reduce the magnitude of f through multiplying it by the correct factor (e.g., .85, .90, .95). The value of .85 is suggested by French & Finch (2013) <doi: 10.1177/0013164412472341>. |
missing.code |
Indication of how missing values were defined in the data. By default, missing.code="NA". |
missing.impute |
The approach selected to handle missing item responses. By default, missing.impute="LW", indicating the list-wise
deletion will be used. Other options include: "PM" (person mean or row mean imputation),"IM" (item mean or column mean imputation),
"TW" (two-way imputation), "LR" (logistic regression imputation), and EM (EM imputation). Check the package "TestDataImputation"
(https://cran.r-project.org/package=TestDataImputation) for more details. |
anchor.items |
A scored item responses matrix of selected anchor items. This matrix should be a subset of the response data matrix specified above. By default, anchor.items=NULL. |
purification |
True of false argument, indicating whether purification will be used. By default, purification=FALSE. |
max.iter |
The maximum number of iterations for purification. The default value is 10. |
alpha |
The alpha value used to decide on the DIF items. The default value is .05. |
Details
This main function computes both unadjusted and adjusted Mantel-Haenszel statistics in the presence of multilevel data.
Value
A list of MH statistcs, contigency tables, etc.
MH.values |
Summary of estimated MH statistics and corresponding p-values. Specifically, |
Stratum.statistics |
summary statistics for each item: n.valid.strata, n.valid.category, and also sample sizes for each stratum across items. |
c.table.list.all |
A list that contains all contigency tables across items and strata. |
c.table.list.valid |
A list that contains only valid contigency tables across items and strata. Strata that have missign item response categories or zero marginal means are removed. |
data.out |
A cleaned data set with variables "Group", "Group.factor","Cluster", "Stratum", and all item responses (with missing data handled). |
References
Begg, M. D. (1999). "Analyzing k (2 × 2) Tables Under Cluster Sampling." Biometrics, 55(1), 302-307. doi:10.1111/j.0006-341X.1999.00302.x.
Begg, M. D. & Paykin, A. B. (2001). "Performance of and software for a modified mantel-haenszel statistic for correlated data." Journal of Statistical Computation and Simulation, 70(2), 175-195. doi:10.1080/00949650108812115.
French, B. F. & Finch, W. H. (2013). "Extensions of Mantel-Haenszel for Multilevel DIF Detection." Educational and Psychological Measurement, 73(4), 648-671. doi:10.1177/0013164412472341.
Holland, P. W. & Thayer, D. T. (1988). "Differential item performance and the Mantel-Haenszel procedure." In H. Wainer & H. I. Braun (Eds.), Test validity (pp.129-145). Lawrence Erlbaum Associates, Inc.
Examples
#Specify the item responses matrix
data(data.adult)
Response.data<-data.adult[,2:13]
#Run the function with specifications
ML.DIF.out<-ML.DIF (Response.data, Response.code=c(0,1),Cluster=data.adult$Cluster,
Group=data.adult$Group, group.names=c('Reference','Focal'),
Stratum=NULL, correct.factor=0.85,
missing.code="NA", missing.impute="LW",
anchor.items=NULL, purification=FALSE,
max.iter=10, alpha = .05)
#Obtain results
ML.DIF.out$MH.values
ML.DIF.out$Stratum.statistics
Data Example (binary)
Description
This data example contains binary (0/1) responses of 684 participants to 12 items. Particpants were classified into 34 clusters and 2 groups.
Usage
data("data.adult")
Format
A data frame with 684 observations on the following 14 variables.
Cluster
The cluster variable
I1
Item 1
I2
Item 2
I3
Item 3
I4
Item 4
I5
Item 5
I6
Item 6
I7
Item 7
I8
Item 8
I9
Item 9
I10
Item 10
I11
Item 11
I12
Item 12
Group
Binary group membership variable
Details
A data set with 14 variables: (1) binary (0/1) responses of 684 participants to 12 items; (2) a cluster indicator variable; and (3) a group indicator variable.
Examples
data(data.adult)
## maybe str(data.adult) ; plot(data.adult) ...
Modified data.adult by removing all strata with zero marginal means.
Description
This data example contains binary (0/1) responses of 684 participants to 12 items. Particpants were classified into 10 clusters, 2 groups, and 3 strata.
Usage
data("data.adult.revised")
Format
A data frame with 684 observations on the following 15 variables.
Cluster
The cluster variable
I1
Item 1
I2
Item 2
I3
Item 3
I4
Item 4
I5
Item 5
I6
Item 6
I7
Item 7
I8
Item 8
I9
Item 9
I10
Item 10
I11
Item 11
I12
Item 12
Group
Binary group membership variable
Stratum
A prespecified matching variable with three levels
Details
A data set with 15 variables: (1) binary (0/1) responses of 684 participants to 12 items; (2) a cluster indicator variable with ten levels; (3) a group indicator variable with two levels; and (4) a stratum variable with three levels.
Examples
data(data.adult.revised)
## maybe str(data.adult.revised) ; plot(data.adult.revised) ...
Data Example (Ordinal)
Description
This data example contains ordinal (1/2/3/4) responses of 300 participants to 5 items. Participants were classified into 6 clusters and 2 groups.
Usage
data("data.ordinal")
Format
A data frame with 300 observations on the following 7 variables.
Group
Group membership
Cluster
Cluster membership
I1
Item 1
I2
Item 2
I3
Item 3
I4
Item 4
I5
Item 5
Details
A data set with 7 variables: (1) ordinal (1/2/3/4) responses of 300 participants to 5 items; (2) a cluster indicator variable with six levels; and (3) a group indicator variable with two levels.
Examples
data(data.ordinal)
## maybe str(data.ordinal) ; plot(data.ordinal) ...