twoxtwo
provides a collection of utilities for data
analysis with two-by-two contingency tables. The functions in the
package allow users to conveniently aggregate and summarize
observation-level data as counts.
The two-by-two table is used in epidemiology to summarize count data by combinations of binary exposure and outcome variables as follows:
OUTCOME + | OUTCOME - | |
---|---|---|
EXPOSURE + | A | B |
EXPOSURE - | C | D |
The notation in the table above corresponds to:
The package allows for construction of two-by-two tables, as well as direct calculation of measures of effect and hypothesis testing to assess the relationship between the epidemiological exposure and outcome variables.
twoxtwo
The usage demonstration below requires that the twoxtwo
and dplyr
packages are loaded:
The data set used to illustrate the twoxtwo
functions
will be observation-level data describing smoking status
(exposure) and high blood pressure (outcome). For this
example, there will be 100 smokers and 200 non-smokers. Of the smokers,
40 will have high blood pressure. 50 of the non-smokers will have high
blood pressure:
sh <-
tibble(
smoke = c(rep(TRUE, 100), rep(FALSE,200)),
hbp = c(rep(1,40),rep(0,60),rep(1,50),rep(0,150))
)
sh
# # A tibble: 300 × 2
# smoke hbp
# <lgl> <dbl>
# 1 TRUE 1
# 2 TRUE 1
# 3 TRUE 1
# 4 TRUE 1
# 5 TRUE 1
# 6 TRUE 1
# 7 TRUE 1
# 8 TRUE 1
# 9 TRUE 1
# 10 TRUE 1
# # ℹ 290 more rows
The twoxtwo()
constructor function will aggregate the
observations to counts by exposure and outcome:
The twoxtwo
object is an S3
class. When
printed to the console it will display the counts in the contingency
table:
sh_2x2
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=TRUE |40 |60 |
# |EXPOSURE |smoke=FALSE |50 |150 |
The object is a list with multiple elements, each of which can be extracted by name if needed.
For example, to view the aggregated counts as a
tibble
:
sh_2x2$tbl
# # A tibble: 2 × 4
# hbp_1 hbp_0 exposure outcome
# <dbl> <dbl> <chr> <chr>
# 1 40 60 smoke::TRUE/FALSE hbp::1/0
# 2 50 150 smoke::TRUE/FALSE hbp::1/0
To view counts of each cell per the two-by-two notation:
To view the exposure variable and its levels:
To view the outcome variable and its levels:
To view the number of observations missing either exposure or outcome:
And to view the original data (stored in the twoxtwo
object by default1):
sh_2x2$data
# # A tibble: 300 × 2
# smoke hbp
# <lgl> <dbl>
# 1 TRUE 1
# 2 TRUE 1
# 3 TRUE 1
# 4 TRUE 1
# 5 TRUE 1
# 6 TRUE 1
# 7 TRUE 1
# 8 TRUE 1
# 9 TRUE 1
# 10 TRUE 1
# # ℹ 290 more rows
The S3
class has a summary method, which summarizes the
count data and computes measures of effect (odds ratio, risk ratio, and
risk difference). When the summary is printed it displays the count
data, information about the twoxtwo
object (missing data
and exposure/outcome), as well as effect measures:
sh_2x2 %>%
summary(.)
#
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=TRUE |40 |60 |
# |EXPOSURE |smoke=FALSE |50 |150 |
#
#
# Outcome: hbp
# Outcome + : 1
# Outcome - : 0
#
# Exposure: smoke
# Exposure + : TRUE
# Exposure - : FALSE
#
# Number of missing observations: 0
#
# Odds Ratio: 2 (1.198,3.338)
# Risk Ratio: 1.6 (1.139,2.247)
# Risk Difference: 0.15 (0.037,0.263)
When the summary is assigned to an object, it stores a named list with the effect measures:
sh_2x2_sum <-
sh_2x2 %>%
summary(.)
#
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=TRUE |40 |60 |
# |EXPOSURE |smoke=FALSE |50 |150 |
#
#
# Outcome: hbp
# Outcome + : 1
# Outcome - : 0
#
# Exposure: smoke
# Exposure + : TRUE
# Exposure - : FALSE
#
# Number of missing observations: 0
#
# Odds Ratio: 2 (1.198,3.338)
# Risk Ratio: 1.6 (1.139,2.247)
# Risk Difference: 0.15 (0.037,0.263)
sh_2x2_sum
# $odds_ratio
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
#
# $risk_ratio
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
#
# $risk_difference
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
do.call("rbind", sh_2x2_sum)
# # A tibble: 3 × 6
# measure estimate ci_lower ci_upper exposure outcome
# * <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
# 2 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
# 3 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
Note that the measures of effect are only computed in the
twoxtwo()
summary if the “retain” argument is set to
TRUE
.
Individual measures of effect (odds ratio, risk ratio, and risk
difference) can be calculated directly. Each measure includes the point
estimate and confidence interval based on the α specified and standard error
around the estimate. If the user passes a twoxtwo
object
into a data analysis function, the exposure and outcome will be
inherited:
sh_2x2 %>%
odds_ratio()
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
sh_2x2 %>%
risk_ratio()
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
sh_2x2 %>%
risk_diff()
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
Alternatively, users can directly perform data analysis
without first creating a twoxtwo
object:
sh %>%
odds_ratio(., exposure = smoke, outcome = hbp, alpha = 0.05)
# # A tibble: 1 × 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
As with measures of effect, hypothesis tests (Fisher’s exact test for
count data and Pearson’s χ2 test) can be run on a
twoxtwo
:
sh_2x2 %>%
fisher()
# # A tibble: 1 × 9
# test estimate ci_lower ci_upper statistic df pvalue exposure outcome
# <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <dbl> <chr> <chr>
# 1 Fisher's E… 2.00 1.16 3.44 NA NA 0.0108 smoke::… hbp::1…
sh_2x2 %>%
chisq()
# # A tibble: 1 × 9
# test estimate ci_lower ci_upper statistic df pvalue exposure outcome
# <chr> <lgl> <lgl> <lgl> <dbl> <int> <dbl> <chr> <chr>
# 1 Pearson's … NA NA NA 6.45 1 0.0111 smoke::… hbp::1…
Or without first creating a twoxtwo
object:
twoxtwo
All processing of exposure and outcome requires that both variables
must have only two levels. By default, variables are
coerced to factors and reversed. The result is that, as in the example
presented above, a value of TRUE
or 1
will be
oriented as exposure or outcome “+” and a corresponding value of
FALSE
or 0
will be oriented as exposure or
outcome “-”.
The twoxtwo()
constructor function is flexible enough to
allow user-specified ordering via a named list passed to the “levels”
argument:
sh %>%
twoxtwo(.,
exposure = smoke,
outcome = hbp,
levels = list(exposure = c(FALSE,TRUE), outcome = c(1,0)))
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=FALSE |50 |150 |
# |EXPOSURE |smoke=TRUE |40 |60 |
As mentioned above, the twoxtwo()
function is abstracted
in other analysis functions. Each of these functions inherits all
arguments that can be passed to twoxtwo
, including the
“levels” parameter:
Users can override this behavior with
twoxtwo(..., retain = FALSE)
↩︎