Getting started with BEMPdata

Overview

The BEMPdata package provides access to the Bangladesh Environmental Mobility Panel (BEMP), a household panel survey on environmental migration along the Jamuna River in Bangladesh (2021–2024). The dataset covers 1,691 households across 20 survey datasets (14 rounds: 4 annual in-person waves and 10 bi-monthly phone waves), yielding 24,279 completed surveys.

Data are hosted on Zenodo and downloaded on demand. Files are cached locally after the first download, so subsequent calls are instant.

Installation

# Install from GitHub
remotes::install_github("janfreihardt/BEMPdata")

Wave structure

The package includes a built-in overview of all 20 wave datasets:

library(BEMPdata)

wave_overview

# In-person waves only
wave_overview[wave_overview$type == "in-person", ]

Wave identifiers follow the pattern w{round}[_M|_N|_V]:

Suffix Meaning
(none) Main household questionnaire
_M Migrant questionnaire
_N Non-migrant questionnaire
_V Village profile questionnaire

Downloading wave data

Use get_wave() to download and load a wave. The first call downloads the full CSV archive (~6 MB) from Zenodo; all subsequent calls use the local cache.

# Baseline in-person wave (2021)
w1 <- get_wave("w1")
head(w1)

# Wave 6, migrant questionnaire
w6_migrant <- get_wave("w6_M")

# Wave 14, non-migrant questionnaire, in Stata format (with value labels)
w14_nm <- get_wave("w14_N", format = "dta")

Working with codebooks

Look up a variable by keyword

# Find all variables related to income
lookup_variable("income")

# Search only in variable labels
lookup_variable("migrat", fields = "label")

# Use a regular expression
lookup_variable("flood|erosion")

Get the full codebook for a wave

# Codebook for the baseline wave
cb_w1 <- get_codebook("w1")
names(cb_w1)

# Merged codebook across all waves
cb_all <- get_codebook("all")
nrow(cb_all)

The pre-built codebook object ships with the package and is available immediately without downloading:

# Available offline
head(codebook[, c("wave", "variable_name", "variable_label", "block")])

Cache management

# Check what is cached and how much space it uses
bemp_cache_info()

# Clear the cache (will prompt for confirmation)
bemp_cache_clear()

Linking waves

The panel respondent code is stored in the registration block of each wave. Here is a minimal example of merging two waves:

library(dplyr)

w1  <- get_wave("w1")
w6n <- get_wave("w6_N")

# Identify the respondent code columns
lookup_variable("respondent code", fields = "label")

# Merge on respondent code (adjust variable names as needed)
panel <- inner_join(w1, w6n, by = "w1_reg1", suffix = c("_w1", "_w6n"))

Citation

If you use this package or dataset, please cite:

R package:

Freihardt, J. (2026). BEMPdata: R package for the Bangladesh Environmental Mobility Panel. Zenodo. https://doi.org/10.5281/zenodo.18775710

Dataset:

Freihardt, J. et al. (2026). The Bangladesh Environmental Mobility Panel (BEMP): Panel data on (im)mobility, socio-economic, and political impacts of riverbank erosion and flooding in Bangladesh [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.18229498

Data descriptor:

Freihardt, J. et al. (forthcoming). Bangladesh Environmental Mobility Panel (BEMP). [Journal]. DOI: [to be added]