Countable Histograms with gf_squareplot()

library(coursekata)

Overview

gf_squareplot() creates histograms where individual data points are visible as stacked unit rectangles. Instead of abstract bars, each observation becomes a countable square, making sample size and distribution shape tangible.

This is particularly useful for teaching statistical concepts like sampling distributions and hypothesis testing, where students benefit from seeing that “n = 47” means 47 actual squares.

Basic Usage

Pass a formula and data frame, just like other gf_* functions:

gf_squareplot(~Thumb, data = Fingers)

Display Modes

The bars parameter controls how the histogram is displayed:

gf_squareplot(~Thumb, data = Fingers, bars = "outline")

Customizing Appearance

You can customize fill color, binwidth, and axis limits:

gf_squareplot(~Thumb, data = Fingers,
              fill = "coral",
              binwidth = 5,
              xrange = c(30, 90))

Integer Data

For integer-valued data with a small range, gf_squareplot() automatically selects a binwidth of 1, so each integer gets its own column:

int_data <- data.frame(rolls = sample(1:6, 30, replace = TRUE))
gf_squareplot(~rolls, data = int_data)

Large Samples

When any bin has more than 75 observations, the function automatically switches to solid bars to keep the display readable. You can opt into subdivision instead with auto_subdivide = TRUE, which splits wide bins into sub-columns so rectangles remain countable:

large_data <- data.frame(x = rnorm(500, mean = 50, sd = 10))
gf_squareplot(~x, data = large_data)

Teaching Features

Mean Line

Show a dashed line at the sample mean:

gf_squareplot(~Thumb, data = Fingers, show_mean = TRUE)

DGP Overlay

The show_dgp = TRUE option adds a teaching overlay for hypothesis testing contexts. It shows:

set.seed(42)
samp_dist <- do(100) * b1(Thumb ~ Height, data = sample(Fingers, 30))
gf_squareplot(~b1, data = samp_dist,
              show_dgp = TRUE,
              show_mean = TRUE,
              xrange = c(-0.5, 1.5),
              xbreaks = seq(-0.5, 1.5, by = 0.25))

Factor Input

When the input is a factor with numeric levels, all levels are displayed on the x-axis even if some have zero counts:

ratings <- factor(sample(1:5, 20, replace = TRUE, prob = c(1, 2, 4, 2, 1)),
                  levels = 1:5)
df <- data.frame(rating = ratings)
gf_squareplot(~rating, data = df)