---
title: "The Shell Game: conceptual framework"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The Shell Game: conceptual framework}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

# The Shell Game

**Swapping the VAR without changing the name or the formula.**

Same column name.  
Different underlying quantity.

**That's the shell game.**

## What the Shell Actually Is

The shell **isn't** geography.  
The shell **is** the variable label.

If this were just geometric mismatch, we would have:

- rounding error
- boundary error
- aggregation perturbation

But because the estimate silently shifts, what we're actually doing is:

**Imputing population via a proxy and continuing to treat it as observed.**

This is how the distortion compounds.

## The Critical Shift

**We're no longer propagating data.**  
**We're propagating assumptions.**

Assumptions are sticky. They don't decay; they **accumulate**.

When you apply standard administrative crosswalks in succession, each step amplifies the prior steps assumptions, and allocation choices. And, adds its own. In this example case study, shellgame is using HUD and TOT_RATIO, which assumes and allocates, residential address distributions as a proxy for population. The resulting estimates retain the population label despite no longer representing directly observed population quantities. As a result, successive users inherit increasingly imputed values while treating them as empirical measurements.

## The Transformation Chain

I am going to quantify mismatch **at each hop**, nothing more, nothing less.

**ZCTA → ZIP → COUNTY**

### Pre-allocation

Some Census tabulation areas are being asked to stand in for **multiple postal service geographies**, with no guidance on how population should be divided.

**This is the exact moment analysts silently switch from "joining data" to "inventing rules."**

At this stage, we:

**have not:**
- applied TOT_RATIO
- averaged anything
- redistributed population

**have only:**
- expanded relationships

**Yet already:**
- The unit count has changed
- The geography has fragmented
- The analytical surface has shifted

Any downstream "fix" is compensating for **damage already done**, not refining a neutral process.

## First Claim

We can already say, truthfully and precisely:

*"Using a lookup-style ZCTA-ZIP association increases the number of spatial units representing a county by 32% (74 ZCTAs to 98 ZIPs), prior to any allocation or weighting."*

## The Two Hidden Decisions

### Decision 1: Membership Definition

**Are we defining membership by administrative linkage, or by geometric contact?**

- 74 ZCTAs (relationship-based membership used in ACS)
- vs 94 ZCTAs (geometric intersection with county boundary)
- 20 ZCTAs excluded before any transformation begins

This affects the baseline before any allocation occurs.

### Decision 2: Crosswalk Selection

When you write "Used HUD crosswalk 3rd quarter 2024" - what decision was made?

- Choice of crosswalk source (HUD vs Census vs commercial)
- Choice of time period (Q3 2024 vs Q4 2024 vs 2023)
- Choice of allocation method (TOT_RATIO vs RES_RATIO vs BUS_RATIO)

Each choice produces different results, but they're rarely acknowledged as methodological decisions.

## The Result

For Hennepin County, Minnesota:

**Input**: Population at ZCTA level  
**Transformation**: ZCTA → ZIP → County  
**Output**: Population at each level  
**Result**: delta(Population) = Baseline - Recovered

- **Baseline**: 1,391,557 (74 ZCTAs, directly observed from ACS)
- **After ZCTA→ZIP→County**: 1,216,874 (recovered via HUD TOT_RATIO)
- **Perturbation**: 174,683 people (-12.6%)

## Why This Is Agnostic

The shell game happens regardless of:

- **Which variable**: Population, median income, vehicle ownership - same % error
- **Which tool**: R, Python, Stata, ArcGIS - same % error
- **Which geography**: Hennepin County or anywhere else - same pattern

**The transformation is the cause, not the tool or variable.**

The shell game happens regardless of who's shuffling the shells.

## What This Package Does

`shellgame` provides tools to:

1. **Quantify** the error at each transformation hop
2. **Reveal** where the swap from observed to imputed data occurs
3. **Demonstrate** that the error is agnostic to inputs
4. **Document** the hidden decisions in the workflow

Use it to audit yourself. Use it to understand what's really happening when you transform geographic data.

**Same column name. Different underlying quantity. That's the shell game.**
